Can AI Create Websites and UI and Take Over Experimentation Designer's Jobs?
As a UX/UI designer who works in the experimentation industry, I am bombarded every day by articles telling me that my job will be taken by AI.
So I decided to do some research on the topic to understand the current (November 2023) state of AI in our industry and the probable near future for AI-driven UI design tools.
If this gets a bit philosophical sometimes, it’s my bad. I’m not an AI expert—but I do have decades of interest in artificial intelligence, cognition, and perception.
What AI?
AI is a really broad field of science. Since its start, it experienced cycles of optimism and disappointment. We are in the most optimistic stage yet with deep learning providing amazing results to the wide public.
Its high-profile applications are everywhere from science to automotive industry, video game industry, internet search, content recommendation, speech recognition etc etc.
Some of these systems have existed for a while now. But the real excitement and hype today is all about generative AI, and especially:
- Large Language Models: ChatGPT, Bard, and Grog.
- Text-to-image Diffusion models: Stable Diffusion, Midjourney, DALL-E, and others.
- Video-generating models like Veed.io, Synthesia, and RunwayML.
For AI to take our jobs as UX/UI designers, we would need a generative AI that could create a full user experience, a full website, just from a really clear prompt. Is this possible right now? Let’s see:
Generative AI
Large Language Models (LLM-s) are trained on text. The input and output are only text.
These models achieve amazing results by “simply” generating the most likely next word given the context and input.
As I’m typing this, Google Docs offers me a word that is just right in this sentence. But the fact that the same method is so successful in creating valuable, informed, and correct text, as well as “understanding” the natural language input made many believe these models are capable of almost anything. But are they?
Generative AI is built on pattern recognition. When you use a Text to Image AI Model to generate an image, the machine starts with some random images, looks at them, and chooses the one that matches the description (prompt) more closely—while discarding all the others.
Then the model does the same with variations of the chosen image. After lots and lots of iterations, the remaining image fully matches the description.
The success of these two types of models begs the question: Since website or app UI designs are basically text and images, shapes, can’t these models successfully create full UI layouts based on our prompt? As a first guess, this should be pretty straightforward. There goes our job.
Can’t we just describe the desired user experience and get a full new shiny clickable prototype with all the content?
Websites should be simple to generate. GPT-3.5 already codes and writes, while Midjourney produces pictures. Just combine them and be done, right?
However, the reality isn’t that simple.
Can Text-to-Image Diffusion Models Design Websites?
I’ll start with the worst (and oldest) example. I used DALL-E 2 to design a login page. The prompt was: “Create a minimalistic login screen for an online digital marketing website called CXL”. Added which colors it should use, etc. This was the result:
CILX CLOX anyone? What can we see here? We can see that DALL-E was trained on images and not text. So, I got only images of text which made little sense. Other than that it almost resembles something like a login modal. Usable? Not really.
So I looked for other examples of AI-generated layouts made by more advanced tools like Midjourney. The results are more appealing. But not more usable.
The image part is more “beautiful”—if your prompt includes “beautiful” you are more likely to get results that converge to what was labeled “beautiful” in the training set. But is it usable in actual practice? Is it your product that you are selling?
The text is still useless, but at least the first part of it makes some sense 😉 The UI elements below, however… make even less sense.
Some can say that this is just a wrong example. So I tried to understand why this happens, especially with text.
The reason (as I understand it) is quite simple: image generators can’t create meaningful text because it’s really hard to predict the content of text if you’re an AI model trained on images of text. For instance, these two images are really similar - but they’re totally different texts.
Small changes in text make the whole meaning of the text significantly different.
Moreover, to train an image model successfully on text like this, we would need all the images labeled with the text itself. So it makes little sense to create such models when we can use character recognition to first create text strings from the image, then use LLM-s to work with the strings of text.
That’s why image-generating AI models themselves are unlikely to come up with meaningful text in the short term … or maybe ever.
Can Large Language Models design websites?
So, I tried the other way. Ultimately, a website is just code (and some images). Even the SVG icons are just code and I had some success generating SVG code with Gpt-3.5. So, it should be possible to generate a website if the AI model is trained on strings of code.
So I asked GPT-3.5 to “write me an HTML / inline CSS code for a homepage hero image in the style of Everlane.com”
This is what it generated (I had to add the image myself):
It is not too bad. However it kind of just misses almost everything a UI designer is paid to do. And this was the case with my other prompts. It had some success in generating a layout sketch in HTML of what could be a typical layout for a section. With some text that could be typical in this case.
Then you could use some tools (For example Builder.io’s html to Figma plugin) to import this in Figma and edit it further. But which professional would really use this? For what I do now, this suggests a really limited use case. It might however be helpful for a non-designer to have something to start with.
In Speero we design hundreds of experiments a month for various clients. The experiences we create have to be fully integrated into the client’s ecosystem. This doesn’t simply mean that if we design an A/B test we have to match the style of the site.
We need to use the Design System the site was built in and sometimes our workflow needs to adapt to the client’s delivery system.
The experience itself has to be consistent with how the site works elsewhere. Even more crucial: our design is informed by hours of specific research and large amounts of data. There is simply no use of a “typical” layout sketch in highly specific cases most professional designers work in.
But there seems to be an even more basic problem here. The layout generated with LLM-s is not even “beautiful”. In fact, it seems that look wasn’t a concern at all. This is hardly surprising.
It can be hard to get training code datasets labeled by features of their output. More precisely: LLMs are trained on strings of text (code) but not on labels that describe the user’s experience, who is looking at the rendered output of the HTML/CSS code.
While in the case of an image “beautiful” can be an individual direct feature described in a label, in the case of an HTML code “beauty” of the rendered code lies on a different “layer of features” that is not directly accessible from within the code’s set of features.
We would need a system that bridges the gap between these layers of features.
Multimodal AI - a step forward
So we need to combine the systems we have talked about. Large language, text-to-image, and image-to-text models. Text recognition is also something we have for a long time. Voice recognition as well.
GPT-4V tries combining all of them: text recognition, voice recognition, large language model, and text-to-image model. It’s a lot more than a chatbot. It is a Multimodal AI system.
This Multimodal AI integrates all these systems so that they work together. It can take in an image, recognize it, and generate a text description. Then use the description as a prompt to its large language model to create text that talks about the image. It can also read the text on an image and send it to the language processing part.
What needs to be noted here is that it looks like the models/modules of this AI are still trained separately. So one deals with images, the other with text, etc, and the connection between them is on the surface level. One can serve as an input to the other. So the text module is not trained on the features of images. While it is a logical step in the direction of full integration, it doesn’t yet solve the previously outlined problems.
Full integration could look a bit paradoxical: The language module would know all that the image module knows, encompassing it all. And the image module would encompass the language module.
UI Design: Layers of representation.
Be warned, this part is a bit philosophical and debatable, and I reserve the right to be wrong. But let’s try to come up with some “layers of features” in the case of UI design:
- The top layer (or the input layer) would be the desired user experience that you ideally describe in natural language. But you could also use a more formalized input of UX principles—descriptions of look and feel. Or you could describe the business problem and the system would propose UX solutions and finally layouts.
- The second layer is the rendered layout. A rendered design is not necessarily the same as the user experience. You can describe most features of how a design looks without knowing anything about what problem it tries to solve or how people use it.
- The third layer is the code. This seems solvable from one side. Some tools generate code from design. Although generating pieces of code that work well in an existing ecosystem might be a big, big challenge. But since we now focus on UI design, let’s not go there.
- Design Systems can be considered a layer as well. Creating a layout is one thing. Integrating it with the strict rules of an existing design system is a challenge.
These layers do not necessarily represent how such a UI design AI would operate, rather trying to show the problem of representation. A feature on one “layer” (that we use for training) is so distantly connected to a feature on the other layer that our current models seem to get lost.
So, Are We Going to Lose Our Jobs?
I’m not saying that our job will never be taken: the future changes quickly nowadays. The first part of my post talks about the problems we face if we want to build AI-powered design tools. So now let’s see what we can still do:
The current AI tools can already help us on these individual “layers”. Tools can generate content. Text. While you may not want to use AI to write your marketing content, it may be helpful for UI copy where more familiar and generic wording can be an advantage.
AI can generate images and even SVG icons. It can replace stock photos to an extent.
You can create layout sketches and images of layouts for inspiration.
However, you can’t do all of this in a single tool. Selecting the tools and applying them to the use case is a designer’s job. We can only use different tools to generate individual layers.
There are lots of AI-powered tools and the goal of this article isn’t to examine and compare them. If you find one that does something that seems to contradict what I wrote, let me know!
However, there are some mainstream tools that we can use, mostly for content:
But can we do anything UI-related?
I was inspired to write this post by the talk given by the Diagram team at Figma’s Config conference. Since Figma bets on Diagram for AI integration, it might be worth a listen. They don’t just promote their product, but also explain the problems they faced and face developing their tools and the ideas on how to overcome some of the problems I mentioned previously.
Diagram’s current tools allow you to do a few cool things:
- SVG icons for anything (Icon generation).
- Create images for your site (Image generation).
- Create a copy instead of Lorem Ipsum (Copy generation).
- It can name your layers in Figma so you don’t need to.
These are mostly done using one existing AI model and integrating it into Figma. But their plans for the future are much more interesting.
Bridge the Gap
They are building tools that do the following:
- AI auto assist: While you design, an assistant offers you options to complete your work.
- AI autocomplete: You design a certain element and the AI completes it with the most likely next element(s).
- Generate an editable layout from a prompt that integrates with your design system.
- Offer variations of your layout.
- Create high-quality glyphs from text prompts.
The methods outlined by the team are attempting to bridge the gap between layers of features:
LLM-s are not trained on Figma design systems—but if the design system or a layout design is exported from Figma to HTML/CSS code and used as part of the prompt then you are now in the domain of LLM-s.
You then import the generated code back to Figma and you can use it right away. This clever idea bridges the feature gap with automation. You could probably call it a multimodal AI that was built for design.
They are working on a model for layers, and a specific UI kit that works well with their AI system called UI-AI.
Most of these exciting features are “coming soon” so we are yet to see how usable and reliable they will be. Will they take our job? As the Diagram folks tell you: They will allow many more non-designers to start designing - and they will relieve designers from monotonous parts of the work. New technologies tend to create jobs, not take them.
The (Far) Future?
The question of “when” is still unanswered. How would it be possible to build AI systems that could really render the designer’s job unnecessary?
Perhaps a huge platform like Figma could be able to build machine learning models that are trained not on text only but on the designs themselves. Not a Large Language Model, but a Large Design Model.
Where descriptions, comments, briefs, design principles, and specifications would be matched with design systems and layouts. There would be no feature gap to bridge because the system could find and map direct connections between designs and briefs.
These systems could also learn from data that is not labeled. Learn how designs usually look and behave. We then could have intelligent design systems that transform and build themselves as you tell them. You may input a problem and you get the design solution.
More. Faster. Better?
One last question still remains though…
Would it really be better than us?
Faster, and more productive, for sure. But better?
This is also a more general question about AI-generated material. How much value does your content have if anyone can generate the same? The value of simple generic content and also the value of “beauty” has been inflated away. Anyone can create a beautiful image and generate the text for a post.
After the inflation, the value that’s left is your own voice, your own message, your own real product. Everyone’s perspective is no one’s perspective. The conditions of a crowd are not a human’s condition. In a way, YOU are more important than before.
Every website already looks the same as we all know. How would a website look, if built by a model that is trained on all those same-looking sites? What would be special in the user’s experience? Even if trained on the best possible practice, the output still remains best practice ... at best. And what I learned on my first day at Speero (in fact before) is that we don’t ‘believe’ in best practices. Why?
Because past guidelines and biases don’t always translate to particular situations and niches. Even the most obvious and concrete best practices don’t include the whole breadth of all shopping behaviors, user intents, and decision-making factors.
This is why we employ statistics and data - to understand specific factors and how our own experience shapes our test recommendations. So many gaps to bridge for a single AI.
So it seems that while machine learning tools will make our job easier and more productive, they won’t be taking it any time soon. We need to use these tools and understand their shortcomings to use them in the most productive way possible.
To fully automate the multi-layered field of UX/UI design, we would probably need something close to AGI (Artificial General Intelligence). But if we’d achieve that, designer jobs would be the least of our concerns.