01 | Infrastructure


01.1 | Docker


Docker allows us to pack all dependencies and specifications of an application in a single docker image and to easily deploy it on any docker host. This means more flexibility, speed and reliability for deployments.

Docker is a virtualization technology. It allows to deploy applications inside of virtual containers which contain the application in an mostly isolated environment. We store our images in an Artifactory instance hosted on the iteratec servers.

Docker also simplifies the deployment process significantly. By providing all dependencies and the application in one binary image, a container can be deployed on any docker host. In the past a server had to be prepared and all dependencies had to be installed before an application could be deployed. This often ended in conflicts with other applications.


01.2 | CI


Continuous Integration is a process where all developers merge their local code into one central code base on a regular basis. This helps to test the product early on and to find errors in the code proactive.

Every time new pieces of code are pushed to the Continuous Integration takes a series of quality assuruance measures like executing a test suite or static code analysis. This verifies the overall code and application quality of the current state. The term was charaterised by the agile programming methodology Extreme Programming.


01.3 | Artifactory


JFrog Artifactory is a storage for digital artifacts. We use it as private docker registry for storing our application images.

The artifactory persists artifacts (in our case docker images) created by developers. When a new version of malsato is ready our continuous integtation environment pushes this new version as an image into the artifactory. Afterwards our server, where our application is running, downloads the new image and deploys it.

This enables us to reproduce the current state of the application anytime and to access earlier versions additionally.

02 | Content


02.1 | Data Scraping


In order to keep our menus up to date, we have to acquire the needed data from multiple sources. Each restaurant needs to be treated separately, since menus are published in different formats and channels. Currently we are able to fetch menus in PDF, HTML and Microsoft Word formats. These documents can be provided via URL or sent by email.


02.1.1 | PDF-Parsing


For parsing pdf documents we convert those into a XML structure. This allow us to extract menu data by using elements and coordinates.

PDF was developed for viewing documents, which whould be depicted the same on each platform. Therefore those documents are not machine readable in the first place and extracting the desired text is not a trivial task. Even if PDFs are made from the same template the internal representation can differ from document to document. To overcome this issue we use reference points inside the document and use them to calculate the actual text coordinates.


02.1.2 | HTML-Parsing


Parsing HTML is done by navigating through the DOM (the structure of the site) until the desired content is reached.

The basic structure of a website rarely changes (except when the site receives an update). So it is possible to use certain classes and ids as reference when navigating through the document. An example would be that all menus could be in a div with the class menus. Every child div would represent a different menu.


02.1.3 | E-Mail-Parsing


The email parser looks for new mails in the inbox and persists text and attachments for further processing.

The email service remembers which mails have already been checked and periodically looks for new mails. If there is indeed new mail it checks if this mail needs to be processed by looking at the sender address and checking it against our database. If this check is positive the mails text and potential attachments are persisted on the server.


02.2 | Scheduler


The scheduler is responsible for invoking scheduled and periodical tasks right on time.

Tasks the scheduler has to invoke are: Fetch all menu data from all restaurants and send push notifications to registered chats.

03 | Backend


03.1 | Python


The python based app is responsible to serve web content like menu and restaurant information and to retrieve menu data from the restaurants. This is accomplished by the web framework Flask and the Jinja2 templating engine.

Python allowed us to rapidly develop first prototypes of Malsato. Furthermore python provides a rich environment of third party libraries.

In order to handle higher request loads the app is hosted by a Gunicorn server which starts multiple instances of the app as processes handling a higher number of requests concurrently.


03.2 | MongoDB


We chose a NoSQL database named MongoDB to store our data. It uses a format similar to JSON to store data in form of documents.

MongoDB does not have a fixed data scheme in contrary to the traditional relational database. It accepts documents of arbitrary structure. They are only separated in different collections which represent different collections of documents. How one wants to group the documents is the developer's choice. We need this flexibility to save menus from different providers, as they are structured differently for each provider.


03.3 | Nginx


Nginx is on the one hand a webserver like Apache. It is therefore able to serve static or dynamic content. However it has one feature for which it is used preferably: Reverse Proxying.

A reverse Proxy is a single point of entry for accessing multiple and/or decentralized applications. It accepts incoming requests and forwards them to the applications they are directed for. It could be seen as a gatekeeper between the internal infrastructure and the outside.

Additionally we use Nginx to ensure a secure encrypted connection to our application.


03.4 | Translation


Translations (Internationalization) are managed using a python library called Babel. Therefore no texts are used inside of our HTML documents. Instead only translation keys are declared.

The actual internationalized strings are stored within so called PO files (origins from GNU gettext). For each supported language exists a separate PO file. Depending on the browser's language the corresponding texts are loaded into the HTML template. English is used as fallback language, if the requested language is not supported.

04 | Frontend


04.1 | WebApp


As already mentioned, the website is rendered by the Jinja2 template engine. For styling we use webpack and Bulma CSS.

Using Jinja 2 all pages are rendered by the server and are then served to the client. Webpack is used to bundle our SCSS and Javascript files into normal browser readable files. This ensures a clean separation of our software components without affecting the loading times of the website.


04.2 | Channels


Accessing our data is possible in two channels. On one hand our website, where all menus are displayed. On the otther hand we do have chat integrations where the menus can be viewed directly inside of a chat application we are supporting.

Our chat integratioin supports an active and a passive way to get the current menu data. First a so called slash command can be used. A slash command is used within a chat which sends a message to our server which then in return delivers the current menu data.

Secondly webhooks can be registered, which are then used by malsato to post the menu data for the current day into the chat directly. Therefore no manual task is necessary to get the menus for the current day.