Installation for production

This page document how to perform a native deployement on a Debian-like environment, for a udata user with home in/srv/udata. These steps require the system dependencies to be installed.

User and home dir creation

We want to create a udata user which primary group is udata, member of the www-data group and having its home directory in /srv/udata:

$ useradd -m -d /srv/udata -G www-data udata

You can check the result with:

$ id udata
uid=1001(udata) gid=1001(udata) groups=33(www-data)
$ ls -l /srv
total 4
drwxr-xr-x 9 udata udata 4096 Jun 23 05:50 udata

You can now log into this account to install uData:

$ su - udata
$ pwd
/srv/udata

Python and virtual environment

It is recommended to work within a virtualenv to ensure proper dependencies isolation. If you’re not familiar with that concept, read Python Virtual Environments - a Primer.

We create a virtualenv in the udata home directory so it is activated each time you log into its account:

$ virtualenv --python=python2.7 $HOME
$ . bin/activate
$ pip install Cython  # Enable optimizations on some packages
$ pip install --upgrade setuptools  # Make sure setuptools is up to date
$ pip install udata

You can also install the extensions you want:

$ pip install udata-piwik

Note

We install Cython before all other dependencies because some have an optionnal compilation support for Cython resulting in better performances (mostly XML harvesting).

You can now create your configuration file:

$ touch udata.cfg
$ export UDATA_SETTINGS=/srv/udata/udata.cfg

To ease the udata client handling, you might want to export this environment variable each time you login:

$ echo "export UDATA_SETTINGS=/srv/udata/udata.cfg" >> .profile

Then you need to put some configuration parameters in your file. See Adapting settings for details about the configuration file.

You’re all set, you can now use the udata command line client to initialize the platform. Just answer the questions:

$ udata init

Sample nginx & uWSGI configuration

You can use whatever stack you want to run uData, nginx or Apache 2 as reverse proxy, supervisord + Gunicorn or uWSGI…

All you need to remember is that uData requires at least 3 services to run: - a web frontend using the udata.wsgi WSGI entry point. - a worker service using celery - a beat/cron service using celery too

We give you an example for a udata user serving a data.example.com domain from /srv/udata on a single server with:

  • nginx + uWSGI to run the frontend,
  • systemd handling both the worker and the beat services,
  • the middlewares running on the same host.

Install nginx and uWSGI with root privileges:

$ apt-get install nginx-full uwsgi uwsgi-plugin-python

Let’s start with the uwsgi configuration file:

/etc/uwsgi/apps-available/udata-front.ini

##
# uWSGI configuration for data.example.com front
##

[uwsgi]
master= true

; Python / Environment configuration
plugin = python
home = /srv/udata
chdir = %(home)
virtualenv = %(home)
pythonpath = %(home)/bin
module = udata.wsgi
callable = app

; Sockets and permissions
stats = /tmp/udata-front-stats.sock
socket = /tmp/udata-front.sock
chmod-socket = 664
uid = udata
gid = www-data

; Tune these values according to your environment
processes = 5
cpu-affinity = 1

; Disable requests logging
disable-logging = True

; Disable write exception when nginx timed out before uwsgi response
disable-write-exception = true

; Avoid PyMongo fork issue
; http://stackoverflow.com/questions/34369866/running-uwsgi-with-mongoengine
lazy-apps = true

; Recycle workers (tune according to you environment)
max-requests = 4000
reload-on-as = 512
reload-on-rss = 192
limit-as = 1024
no-orphans = true
vacuum = true
reload-mercy = 8

Then create the symlink to activate this configuration:

$ ln -s /etc/uwsgi/apps-{available,enabled}/udata-front.ini

You can now create the systemd unit file for Celery:

/etc/systemd/system/celery.service

##
# A systemd unit file for udata celery services
#
# This launch 4 processes:
# - a beat service
# - a high queue consumer/worker
# - a default queue consumer/worker
# - a low queue consumer/worker
##

[Unit]
Description=udata celery services
After=network.target

[Service]
Type=forking
User=udata
WorkingDirectory=/srv/udata
LogsDirectory=udata-celery
RuntimeDirectory=udata-celery
ExecStart=/srv/udata/bin/celery multi start high default low \
  -c 1 -Q:high high -Q:default high,default -Q high,default,low \
  -A udata.worker --beat:1 \
  --pidfile=/var/run/udata-celery/worker-%%n.pid \
  --logfile=/var/log/udata-celery/worker-%%n%%I.log \
  --loglevel=INFO
ExecStop=/srv/udata/bin/celery multi stopwait high default low \
  --pidfile=/var/run/udata-celery/worker-%%n.pid \
ExecReload=/srv/udata/bin/celery multi restart worker \
  -A udata.worker --beat:1 \
  --pidfile=/var/run/udata-celery/worker-%%n.pid \
  --logfile=/var/log/udata-celery/worker-%%n%%I.log \
  --loglevel=INFO

[Install]
WantedBy=multi-user.target

Note: This unit file handle tasks priorities and beat. Adapt it to your needs according to the Celery daemonizing documentation. You might need to allow more workers on a queue or extract the beat service into its own unit if you have multiple workers.

Then load it into systemd and enable it:

$ systemctl daemon-reload
$ systemctl enable celery

Then define a nginx server host configuration in /etc/nginx/sites-available/data.example.com:

##
# nginx configuration for data.example.com
##

## uWSGI
upstream uwsgi-udata {
    ip_hash;
    server unix:///tmp/udata-front.sock;
    keepalive 32;
}

server {
    listen 80;
    server_name data.example.com;

    access_log /var/log/nginx/data.example.com.access.log;
    error_log /var/log/nginx/data.example.com.error.log;

    client_max_body_size 0; # Disable max client body size

    root /srv/udata/public/;

    # Enable gzip compression
    gzip on;
    gzip_disable "msie6";
    gzip_min_length  1100;
    gzip_buffers  4 32k;
    gzip_types
        application/atom+xml
        application/javascript
        application/json
        application/rdf+xml
        application/rss+xml
        application/vnd.geo+json
        application/vnd.ms-fontobject
        application/x-font-ttf
        application/x-javascript
        application/xml
        font/opentype
        image/svg+xml
        image/x-icon
        text/css
        text/csv
        text/javascript
        text/plain
        text/xml;
    gzip_vary on;

    add_header Pragma public;
    add_header Cache-Control public;
    add_header Connection "keep-alive";

    location / {

        try_files $uri @wsgi;

        location ~ /static/ {
            expires 1M;
        }

        location ~ /_themes/ {
            expires 1M;
        }

        location ~ /s/ {
            # Resources are stored separately
            alias /srv/udata/fs/;
            # Disable disk buffering for downloads
            proxy_max_temp_file_size 0;
            expires 1M;

            add_header 'Access-Control-Allow-Origin' '*';
            add_header 'Access-Control-Allow-Methods' 'GET, OPTIONS';
        }
    }

    location @wsgi {
        uwsgi_pass uwsgi-udata;
        include uwsgi_params;

        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}

Then create a symlink to activate it:

$ ln -s /etc/nginx/sites-{available,enabled}/data.example.com

Before restarting all services to start uData, we need to adjust its configuration and collect static assets to make them available for nginx.

su - udata

Edit your udata.cfg configuration with these parameters:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

DEBUG = False

SITE_ID = 'data.example.com'  # Used to store metrics and portal configuration
SITE_TITLE = 'My awesome open data portal'

SERVER_NAME = 'data.example.com'

SECRET_KEY = 'put-some-unique-and-secret-key-here-for-security'

MONGODB_HOST = 'mongodb://localhost:27017/udata'

ELASTICSEARCH_URL = 'localhost:9200'

BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_TASK_RESULT_EXPIRES = 86400

# We use Redis as caching backend but in a separate collection
CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/2'

# The identity used to send mails
MAIL_DEFAULT_SENDER = ('Open Data Portal', 'portal@data.example.com')

# Set you available languages
LANGUAGES = {
    'fr': 'Français',
    'en': 'English',
}
# Here is you default language
DEFAULT_LANGUAGE = 'fr'

# Optionnaly activate some installed plugins
PLUGINS = (
    'piwik',
)

# Optionnaly activate an installed theme
THEME = 'my-theme'

# Define where resources are stored and exposed
FS_ROOT = '/srv/udata/fs'
FS_PREFIX = '/s'

You can now process static assets in the directory declared in the nginx configuration (ie. /srv/udata/public):

$ udata collect -ni $HOME/public

Alright, everything is ready to run uData so logout from the udata account and restart nginx and uWSGI:

$ service uwsgi restart
$ service celery restart
$ service nginx restart

And then go see your awesome open data portal on http://data.example.com.