The software in the thing usually does nothing much: wake up, connect to the access point, request an image from a server over http, display it and go to sleep again. The free RAM inside the ESP8266 is only 40K or so, which is pretty small compared to the 60K needed to store an 800x600 monochrome image. That's why the controller streams the image to the display instead of storing it first. This also means the image has to be offered in a streamable format, which is why the server software can't just output a standard PNG file.
Ofcourse, the display needs to be configured for initial use: it will want to know what access point to connect to plus the password for it, if needed. It'll also need to know the URL to the pixel server. To change that, the ESP8266 will execute a somewhat different routine on initial power up. When the battery is pulled and put back in, the display will show a built-in image to show the device is in configuration mode and will put up an access point. Connect to that access point and you can use your webbrowser to set all the needed parameters.