DALL-E 2 is one of the latest AI systems that can be used to create realistic artwork in seconds. DALL-E takes in a prompt from the user that describes the image that the user intends to create. The image above was generated with the prompt,
Astronaut hacking the international space station with explosions in the background.
DALL-E 2’s input parser is designed to account for every noun, verb, and adjective used. It is best to create as descriptive prompts as possible. While this is incredible on its own, DALL-E can also take an image that was generated and create further variations of that image.
Examples Of DALLE-2
Now getting into some generations, below are a few examples of the images you can create with some simple prompt variations. The initial prompt that we will consider will be the following,
Despite the horrendous grammar, the DALLE parser can still understand and will attempt to represent the prompt. It is also important to notice how one of the images depicted the race car as a hoverboard. This AI is not perfect but thrives when it displays its efficiency at generating permutations of slightly altered prompts.
One important side note, DALLE attempts to avoid generating artwork that is protected by copyrights. This can be seen with the prompt below that references “Kirby”. This character is a pink-colored hero that belongs to Nintendo which is one of the strict companies that enforce their copyrights.
An easy way we can get AI to generate images of beloved gaming icons would be to use the upload feature provided. This can be seen below with the Minecraft steve character that made the following generations.
How To Use DALL-E 2 AI
From here the only limitation is how descriptive the user is. As mentioned prior, the input parser will consider the full input so add as much detail as possible to generate the desired image. The images below the prompt act as examples for the user provided by the DALLE team.
If you hover over the images there is an option to try to generate more images with the prompt that was used in the example. The final way to get started would be to upload an image. This option can be started by selecting the bolded text under the prompt “upload an image”. An interesting note is that there is a disclaimer given for users who chose to upload an image to edit. It goes as follows,
If you continue to upload an image you will be prompted if you wish to crop the photo to create a square version of the upload. From there you can choose to edit the photo or to create variations of the image provided.
Sidenote, DALL-E 2 prefers to use a square aspect ratio when possible and this is likely because Artificial Intelligence technology is based on matrix multiplication. A cool feature of a square aspect ratio as this allows for a square matrix representation of the image. This is all tied together when we consider algorithms such as the Strassen algorithm which utilizes square matrices to obtain a more optimal asymptotic complexity. From the user’s perspective, this allows for faster image generation as there are fewer steps taken to generate the image.
After choosing to edit the image you will be brought to a new page with a basic built-in image editor. Though the options are limited, there is enough to remove any undesirables from basic images if such as the tool from the character Steve below.
Copyright Concerns
Although OpenAI owns the images, you are free to use them for commercial use so long as you give credit. They are very clear in denoting that any attempt to mislead others about the creation of the work ( in particular claiming it is human-made art) violates their terms of service. Another concern is that any artwork uploaded will be added to the AI’s database essentially permitting others to use the same image.
Ethics Of Training AI
It is understandable for any uploads to be added to the database, however, it appears DALLE-2 made a huge ethical mistake. The team decided to perform a web scrape, or in lamens terms, decided to grab any image they could get their hands on. That’s right, if an image is indexed on google then it could be in the database. Truthfully not all images are scraped because humans still have to go in and add data about the content within an image and as a result a portion of them will be discarded. The issue of copyright-protected content leaking into the database occurs over time as realistically more and more images will be added to the database. The old mantra that what gets put on the internet stays on the internet may have a new meaning. If someone uploads their artwork to a personal blog, does that give another permission to download the image because it is publicly available? If yes, can they print it out and sell the artwork? Well, one perspective is that the DALLE team stole countless pieces of artwork that were online but then fed them into a program that users pay to use. Another is that the team simply obtained publically available information and made a bot that has the chance to allow even those who can not draw to be creative.
Personal Remarks
Coming from the perspective of a college student who studies computer science this is an amazing breakthrough in artificial intelligence. This is an exciting breakthrough as this brings us not only an AI that interprets unique human input, but it then creates a unique output. While interpreting unique human input is nothing new, look at google’s search bar, it will never return “no results found”. Just like in the Kirby example above, some outputs will be “random”, or not desired, but it will always return to you back the closest matching interpretation it has access to. This would imply the limits holding back DALLE will devolve over time as more is added to the database. As AI research is constantly improving we will continue to see new advancements such as chatGPT which is a model where the AI takes input and will reply with text. These responses are someone limited in scope as the AI again only knows what is put into the system, however, as time goes on these systems will only continue to improve.
Coming from the perspective of a college student who studied 3d animation for 4 years in high school this has the potential to be an amazing tool. An immediate use in the field I could see would be for brainstorming new ideas or for flushing out an initial idea. Suppose a broad topic is given such as Halloween, we could take our unique idea “A clown holding balloons in an abandon fun house” and get a few different takes that help refine our vision of the final product allowing for a more efficient workflow.
Unfortunately, we do not live in a perfect world and some serious downsides must be mentioned. There will be people who will take images from DALLE and claim them to be their own, this is inevitable and inflicts little to no harm on others in most cases. There will likely be a system developed to create 3d models from text inputs and this should not discourage anyone from the field, it should act as a reminder that we adapt to the tools provided. This will result in fewer hours needed for a project as a whole if one person can complete a scene that normally would take a team the same amount of time. While this can be taken as fewer jobs are needed for the same project, it also implies a lower barrier to entry for anyone to create projects of a higher quality with a bit of an AI handicap. While I do believe certain barriers such as possibly adding copyright metadata in images might help curve issues we have today, I believe as time goes on AI will be accepted as we have other advancements like the railroads, GPS, and the Internet.
The first step is simply to visit their website at https://openai.com/dall-e-2/ or the overly attentive audience can search up DALL-E 2 by Open AI and come to the same website. Once you sign up for an account, you will be greeted with the next screen below.