PlantUML is a great way to represent any technical architecture or workflow. It's an open format and is supported by a wide number of tools. I have used it in countless documents, presentations, Readme files, and PR descriptions. (Can there be a xkcd UML edition? Maybe)
I found myself using a lot of ChatGPT for ideation and mind mapping. One capability it lacked was the ability to render UML documents. It does try hard with Dalle but it's really not usable.
Let’s work with an example of auth flow on a mobile app. This is what you would get as of today with ChatGPT-4:
Notice that the UML code was perfect but it had to be prompted to render the UML and the output was not useful. Let’s fix it.
ChatGPT plugin:
Firstly I used the PlantUML dependency in a spring project and deployed an API that would take the UML text as input and return the image as the output.
Next, we need a manifest file to tell the system where the API is and how to call it. For this, I need to upload two public files to my server. There is good documentation on how to write them and I will skip the details. Here are the files I have for reference:
Next, we deploy and test. Go to `Plugins` > `Plugin store` > `Develop your own plugin` and enter the URL1
Now the plugin is ready to use. With the same prompt, I get a proper UML image
Custom GPT
While developing the plugin is great, there are some serious limitations. Plugins are hard to discover and require open endpoints to be exposed. They also lack custom instructions, as you can see above, it needs to be prompted again to render the image. OpenAI seems to be moving away from plugins and is no longer accepting new submissions to the store. Custom GPTs fix these concerns and are the way forward.
With custom GPT, there is no need to have public manifest files, as they can be configured within the UI. It also supports authentication on the API endpoints and custom instructions. In my case, I instructed it to render the UML image as soon as it sees any UML code. here is the result2:
Some things to consider if you are developing plugins or GPTs:
The API endpoints can not return images. I had to save the image on my server and return the public URL to it inside a JSON. This means as a developer, you would need to store these images and ensure all privacy, etc. In my case, the URL is a UUID and images live for only 48 hours.
For some types of diagrams, `graphviz` dependency is required and will need to be installed on the server (or on the docker image)
The “description” sections in the action configuration can not be treated like plain text. The GPT interprets them to form the API requests and parse responses. I did a copypasta of the description and kept debugging why the API was receiving a JSON instead of plain text.
Big thanks to Robert Reppel for the post that inspired this work
To install the plugin, paste the URL: ai.getresolv.com