Skip to content

Step by step deployment on bare metal: CMS XCache

Cache server standalone installation

Requirements

  • OS: Centos7
  • Port: one open service port
  • Valid CMS /etc/vomses files
  • Valid grid host certifate
  • Valid service certificate that is able to read from AAA (/etc/grid-security/xrd/xrdcert.pem, /etc/grid-security/xrd/xrdkey.pem)

Packages and CAs installation

Create and execute the following script (we are going to install the 4.8.3 version for testing purpose, but consider also the 4.9 and 4.10 as they include new feature and fix):

#!/bin/bash
XRD_VERSION=4.8.3-1.el7

echo "LC_ALL=C" >> /etc/environment \
    && echo "LANGUAGE=C" >> /etc/environment \
    && yum --setopt=tsflags=nodocs -y update \
    && yum --setopt=tsflags=nodocs -y install wget \
    && yum clean all

cd /etc/yum.repos.d
wget http://repository.egi.eu/community/software/preview.repository/2.0/releases/repofiles/centos-7-x86_64.repo \
    && wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo
yum --setopt=tsflags=nodocs -y install epel-release yum-plugin-ovl \
    && yum --setopt=tsflags=nodocs -y install fetch-crl wn sysstat \
    && yum clean all

yum install -y ca-policy-egi-core ca-policy-lcg
/usr/sbin/fetch-crl -q

yum install xrootd-server-$VERSION

mkdir -p /etc/grid-security/xrd/

chown -R xrootd:xrootd /etc/grid-security/xrd/

systemctl enable fetch-crl-cron
systemctl start fetch-crl-cron

curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.2.4-x86_64.rpm
sudo rpm -vi metricbeat-6.2.4-x86_64.rpm

Proxy Renewal service

In order to keep the service proxy valid a simple service che be created on the machine. First create a service file (e.g. /usr/lib/systemd/system/xrootd-renew-proxy.service):

[Unit]
Description=Renew xrootd proxy

[Service]
User=xrootd
Group=xrootd
Type = oneshot
ExecStart = /bin/voms-proxy-init --cert /etc/grid-security/xrd/cert/cert.pem --key /etc/grid-security/xrd/cert/key.pem -voms cms -valid 48:00

[Install]
WantedBy=multi-user.target

Then a timer service is required to manage the frequency of the proxy renewal (/usr/lib/systemd/system/xrootd-renew-proxy.timer):

[Unit]
Description=Renew proxy every day at midnight

[Timer]
OnCalendar=*-*-* 00:00:00
Unit=xrootd-renew-proxy.service

[Install]
WantedBy=multi-user.target

At this point you can start and reload the services with:

systemctl start xrootd-renew-proxy.timer
systemctl daemon-reload

XRootD server configuration

The reference guide for the configuration is the official one here. For the version 4.9 or newer please refer to the recomended ones here

What follows is a working point used and tested at different sites, its purpose is to show the main knobs available and how to threat them. Create a configuration file (e.g. /etc/xrootd/xrootd-xcache.cfg)

# xrd and cmsd process ports
set xrdport=1094
set cmsdport=1213

# cache redirector address
set rdtrCache=0.0.0.0
set rdtrPortCmsd=1213

# address and port of the origin servers
set rdtrGlobal=xrootd-cms.infn.it
set rdtrGlobalPort=1094

# disk occupation water marks
set cacheLowWm=0.80
set cacheHiWm=0.90

# log level for cache processes
set cacheLogLevel=info

# path to folder for storing data, NB it has to be owned by xrootd user
set cachePath=/data/xrd

# ram dedicated to cache (in GB), <=50% of the total is suggested
set cacheRam=16


all.manager $rdtrCache:$rdtrPortCmsd

# logging level for all the different activities
xrootd.trace info
ofs.trace info
xrd.trace info
cms.trace info
sec.trace info
pfc.trace $cacheLogLevel

if exec cmsd

# if the process is the cluster manager, just run in on the chosen port

all.role server
xrd.port $cmsdport

all.export / stage
oss.localroot $cachePath

else

# if the process is the xrd one, configure and start the cache service at the specified port

xrd.port $xrdport

##### GENERAL CONFIGURATION ######

# manage the work directory
all.export /
all.role  server
oss.localroot $cachePath

oss.space meta $cachePath/
oss.space data $cachePath/
pfc.spaces data meta

# in the system is overloaded fallback to remote read
xrootd.fsoverload redirect xrootd-cms.infn.it:1094

# For xrootd, load the proxy plugin and the disk caching plugin.
ofs.osslib   libXrdPss.so
pss.cachelib libXrdFileCache.so

# indicate the origin
pss.origin $rdtrGlobal:$rdtrGlobalPort

##### SECURITY CONFIGURATION ######
xrootd.seclib /usr/lib64/libXrdSec.so

# use gsi as client-cache authN method
sec.protocol /usr/lib64 gsi \
  -certdir:/etc/grid-security/certificates \
  -cert:/etc/grid-security/xrd/xrdcert.pem \
  -key:/etc/grid-security/xrd/xrdkey.pem \
  -d:3 \
  -crl:1
sec.protbind * gsi

ofs.authorize 1
acc.audit deny grant

# use gsi user<-->namespace mapping file as client-cache authZ method
acc.authdb /etc/xrootd/Authfile-auth


##### CACHE CONFIGURATION ######

pfc.diskusage $cacheLowWm $cacheHiWm
pfc.ram       ${cacheRam}g

# Tune the client timeouts to more aggressively timeout.
pss.setopt ParallelEvtLoop 10
pss.setopt RequestTimeout 25
pss.setopt ConnectTimeout 25
pss.setopt ConnectionRetry 2

# Standard values for streaming mode jobs
set cacheStreams=256
set prefetch=0
set blkSize=512k

pss.config streams $cacheStreams
pfc.blocksize   $blkSize
pfc.prefetch    $prefetch

fi

In addition for the authZ part, a file has to be created with the desired permission per user (e.g. /etc/xrootd/Authfile-auth):

# full permissions to all users for both /store/* paths and /*
u * /store/ a / a

Start XCache deamons

The only thing left now is to start the respective deamons with:

# enable and start xrootd server deamons
systemctl enable xrootd@xcache.service
systemctl enable cmsd@xcache.service

systemctl start xrootd@xcache.service
systemctl start cmsd@xcache.service

Test the deployment

  • to check if the daemons started correctly just use systemctl as below:
systemctl status xrootd@xcache.service
systemctl status cmsd@xcache.service

in case of problems logs can be found in /var/log/xrootd/xcache

  • then you can try to copy a file from the origin:
xrdcp -f -v xroot://localhost:<xrdport defined in the configuration above>/<path to your file in origin>
  • the expected outcome is something like:
[root@xrootdcentostest centos]# xrdcp -f -v xroot://localhost:32294//store/mc/RunIISummer17DRPremix/QCD_Pt-15to20_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/AODSIM/92X_upgrade2017_realistic_v10-v2/90000/C85940F6-9596-E711-8FD6-D8D385FF1940.root /dev/null
[544MB/3.108GB][ 17%][========>                                         ][19.43MB/s]
  • Finally you should be able to see your file on the cache disk on the path you indicated in the configuration.

Redirector installation

The configuration for a cache redirector is really simple (e.g. /etc/xrootd/xrootd-cacheredir.cfg)

set rdtrcache=<redirector host>
set rdtrportcmsd=<redirector cluster manager port>
set rdtrportxrd=<redirector xrd port>

all.manager $rdtrcache:$rdtrportcmsd

# temporary fix for CMS multisource jobs - fixed probably by version 5
cms.sched  maxretries 0 nomultisrc

xrd.allow host *
xrd.port $rdtrportxrd
xrd.port $rdtrportcmsd if exec cmsd
all.export /store stage r/o
all.role manager

and then just start the daemons:

# enable and start xrootd redirector daemons
systemctl enable xrootd@cacheredir.service
systemctl enable cmsd@cacheredir.service

systemctl start xrootd@cacheredir.service
systemctl start cmsd@cacheredir.service

Metricbeat installation

Create a metricbeat configuration file (e.g. /etc/metricbeat/metricbeat.yml):

# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html

#==========================  Modules configuration ============================
metricbeat.modules:

#------------------------------- System Module -------------------------------
- module: system
  metricsets:
    # CPU stats
    - cpu

    # System Load stats
    - load

    # Per CPU core stats
    - core

    # IO stats
    - diskio

    # Per filesystem stats
    - filesystem

    # File system summary stats
    - fsstat

    # Memory stats
    - memory

    # Network stats
    - network

    # Per process stats
    - process

    # Sockets (linux only)
    #- socket
  enabled: true
  period: 60s
  processes: ['.*']


#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: 'DUMMY: cache sitename'

#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["DUMMY_esHost.com"]
  template.name: "metricbeat_slave"
  template.path: "metricbeat.template.json"
  template.overwrite: false

  # Optional protocol and basic auth credentials.
  protocol: "http"
  username: "dodas"
  password: "DUMMY"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug

and then just start the daemon:

# enable and start metricbeat service
systemctl enable metricbeat.service
systemctl start metricbeat.service