Difference between revisions of "Basic XRootD"

From DUNE
Jump to navigation Jump to search
 
(38 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
and getting familiar with the basics of XRootD configuration.
 
and getting familiar with the basics of XRootD configuration.
  
=Running a minicluster=
+
=Installing or Building XRootD=
Basic experimentation with XRootD can start with utilizing a few machines
+
The XRootD team provides rpm packages for installation on Scientific Linux. If your OS is not supported in this manner,
where the user has the "sudo" privilege and which aren't heavily loaded
+
building from source is a workable option and is not difficult. Download the archive and follow the instructions on the official XRootD site.
by other applications. It is convenient to control such machines from one screen,
 
and if security is not a concern due to the network being strictly local
 
telnet can be used as a quick solution. On Ubuntu one can install the necessary
 
software in the following manner:
 
<pre>
 
sudo apt-get install xinetd telnetd
 
</pre>
 
  
...and start the service as follows:
+
Be sure to consult the README file included in the archive. One tiny caveat is the the "source directory" mentioned in the README is not the "src" as most people would expect, but is one level above (i.e. it's the directory which contains the unzipped content of the archive you downloaded).
  
<pre>
+
A recent version of CMake shall be required for the build. The build script may be finicky when it's testing the features of the C++ compiler
sudo /etc/init.d/xinetd start
+
found on the system so you may need to upgrade the compiler or install an alternative one and reconfigure PATH and some
</pre>
+
other elements of your setup correspondingly.  It is to be expected that you'll need to take care of a few dependencies,
 +
for example install packages like [http://packages.ubuntu.com/source/trusty/zlib zlib].
  
If you need to add a few applications to your desktop, this can be done as follows:
+
=Running XRootD=
<pre>
+
==Controlling your cluster==
sudo cp /usr/share/applications/firefox.desktop  ~/Desktop/
+
Basic experimentation with XRootD can start with utilizing a few machines which are idle or aren't heavily loaded by other applications. It is very helpful
sudo chmod +x ~/Desktop/firefox.desktop
+
to have the "sudo" privilege (e.g. for installation in standard directories and configuring auxiliary services) but strictly speaking this is not 100% necessary.
</pre>
+
Some Linux trivia helpful for controlling your cluster can be found on the [[Linux Tools]] page.
  
=Building XRootD=
+
==Configuration and Log Files==
XRootD is packaged for installation for a few flavors of Linux. If your OS is not supported in this manner,
+
In most cases you will want to run two processes on each machine, "cmsd" and "xrootd". Each of them will need a proper configuration file. See corresponding sections below. Be sure to include paths for log files in the configuration so the log information is readily available for debugging.
building from source is a workable option. Follow the instructions on the official XRootD site and the
 
README file included in the archive that you will have to download. A recent version of CMake shall be required
 
to the build. It is to be expected that you'll need to take care fo a few dependencies, for example install
 
packages like [http://packages.ubuntu.com/source/trusty/zlib zlib].
 
  
=Running XRootD=
 
 
==Starting a simple instance of xrootd service==
 
==Starting a simple instance of xrootd service==
 
There is more than way to start the xrootd service (see documentation).
 
There is more than way to start the xrootd service (see documentation).
Line 43: Line 32:
 
</pre>
 
</pre>
  
In this case configFile.cfg contains the necessary configuration. Without it present,
+
The "/path/to/data" denotes the designated directory, from which it is allowed to serve
some simple defaults will be assumed but one cannot do anything remotely meaningful.
+
the data. In xrootd terminology, this path is "exported".
The path which is to be exported may be defined in the configuration file as well,
+
In the above example the file configFile.cfg contains other configuration detail.
in which case it's not necessary to put it in the command line.
+
Without options or the config file present, xrootd will still run and some simple defaults will be assumed which will allow for testing,
 +
but the setup isn't likely to be useful for any real application. For example, if you
 +
just type "xrootd" at the OS prompt, you will get a service running on that computer with following limitations:
 +
* the directory /tmp will be exported, for security reasons
 +
* there will be no clustering, just this one server ready to serve the data
 +
This, however, allows for straightforward basic testing (i.e. whether it compiled OK) right out of the box.
 +
 
 +
In addition to the command line option, the path which is to be exported can also be defined in the configuration file
 +
(which is optimal), in which case it's not necessary to put it in the command line.
 +
See the corresponding section below.
  
 
The "-b" option will start the process in the background by default, and the "-l" option can be used
 
The "-b" option will start the process in the background by default, and the "-l" option can be used
Line 77: Line 75:
 
</pre>
 
</pre>
 
...in which case the log files are explicitly defined on the command line (as opposed
 
...in which case the log files are explicitly defined on the command line (as opposed
to the default stderr) and the processes are run as daemons.
+
to the default stderr) and the processes are run as daemons.In this example the exported pat
 +
is defined in the config file so there is no need to put it in the command line.
  
 
The data in the cluster is exposed through the manager node, whose address is to be used in queries.
 
The data in the cluster is exposed through the manager node, whose address is to be used in queries.
Line 88: Line 87:
 
Caveat: if multiple files exist in the system under the same path, the result (i.e. which one gets fetched) is random.
 
Caveat: if multiple files exist in the system under the same path, the result (i.e. which one gets fetched) is random.
  
===Configuration File===
+
==Configuration File==
 
An example of a working configuration file suitable for a server node (not for the manager node):
 
An example of a working configuration file suitable for a server node (not for the manager node):
 
<pre>
 
<pre>
Line 109: Line 108:
 
The redirector coordinates the function of the cluster. For example, it finds the data based on the path
 
The redirector coordinates the function of the cluster. For example, it finds the data based on the path
 
given by the clients such as xrdcp, without the client having to know which nodes contains this bit of data.
 
given by the clients such as xrdcp, without the client having to know which nodes contains this bit of data.
A crude (but working) example of the redirector configuration:
+
A crude (but working) example of the redirector configuration (for cmsd):
 
<pre>
 
<pre>
 
all.manager managerIP:3121
 
all.manager managerIP:3121
Line 121: Line 120:
 
(e.g. for metadata).
 
(e.g. for metadata).
  
Of course the redirector itseld can also carry data, so configuration of the server might look like this:
+
The configuration of the server (i.e. for xrootd) might look like this:
 
<pre>
 
<pre>
 
all.manager managerIP:3121
 
all.manager managerIP:3121
Line 130: Line 129:
 
</pre>
 
</pre>
  
A crude way to initiate a node in this role might look like this
+
The simplest way to initialize the redirector service on this node is as follows
 
<pre>
 
<pre>
 
xrootd -c server.cfg /path/to/data &
 
xrootd -c server.cfg /path/to/data &
Line 175: Line 174:
 
xrdfs managerIP query checksum /my/path/to/file
 
xrdfs managerIP query checksum /my/path/to/file
 
</pre>
 
</pre>
 +
 +
=Hardware Options=
 +
* [https://twiki.cern.ch/twiki/bin/view/CENF/NeutrinoClusterCERN Neut Cluster at CERN]

Latest revision as of 01:34, 27 September 2016

Disclaimer

The information below is not meant to replace XRootD documentation. It may be helpful for experimentation with a small xrootd cluster and getting familiar with the basics of XRootD configuration.

Installing or Building XRootD

The XRootD team provides rpm packages for installation on Scientific Linux. If your OS is not supported in this manner, building from source is a workable option and is not difficult. Download the archive and follow the instructions on the official XRootD site.

Be sure to consult the README file included in the archive. One tiny caveat is the the "source directory" mentioned in the README is not the "src" as most people would expect, but is one level above (i.e. it's the directory which contains the unzipped content of the archive you downloaded).

A recent version of CMake shall be required for the build. The build script may be finicky when it's testing the features of the C++ compiler found on the system so you may need to upgrade the compiler or install an alternative one and reconfigure PATH and some other elements of your setup correspondingly. It is to be expected that you'll need to take care of a few dependencies, for example install packages like zlib.

Running XRootD

Controlling your cluster

Basic experimentation with XRootD can start with utilizing a few machines which are idle or aren't heavily loaded by other applications. It is very helpful to have the "sudo" privilege (e.g. for installation in standard directories and configuring auxiliary services) but strictly speaking this is not 100% necessary. Some Linux trivia helpful for controlling your cluster can be found on the Linux Tools page.

Configuration and Log Files

In most cases you will want to run two processes on each machine, "cmsd" and "xrootd". Each of them will need a proper configuration file. See corresponding sections below. Be sure to include paths for log files in the configuration so the log information is readily available for debugging.

Starting a simple instance of xrootd service

There is more than way to start the xrootd service (see documentation). The simplest way is to start the requisite daemon processes from the command line. Starting the xrootd daemon by itself is enough to serve data from a single node (i.e. without creating a storage cluster).

xrootd -c configFile.cfg /path/to/data &

The "/path/to/data" denotes the designated directory, from which it is allowed to serve the data. In xrootd terminology, this path is "exported". In the above example the file configFile.cfg contains other configuration detail. Without options or the config file present, xrootd will still run and some simple defaults will be assumed which will allow for testing, but the setup isn't likely to be useful for any real application. For example, if you just type "xrootd" at the OS prompt, you will get a service running on that computer with following limitations:

  • the directory /tmp will be exported, for security reasons
  • there will be no clustering, just this one server ready to serve the data

This, however, allows for straightforward basic testing (i.e. whether it compiled OK) right out of the box.

In addition to the command line option, the path which is to be exported can also be defined in the configuration file (which is optimal), in which case it's not necessary to put it in the command line. See the corresponding section below.

The "-b" option will start the process in the background by default, and the "-l" option can be used to specify the path to the log file (otherwise stderr will be assumed). Examples:

cmsd -b -l /path/to/log/cmsd.log -c client.cfg
xrootd -b -l /path/to/log/xrootd.log -c client.cfg

The "cmsd" is the clustering daemon which is explained in one of the following sections.


If the "path to data" is not explicitely defined, xrootd will default to /tmp which might work for initial testing but isn't practical otherwise. Whether xrootd is running as expected can be tested by using the xrdcp client from any machine from which the server is accessible, e.g.

xrdcp myFile.txt root://serverIP//path/to/data

Clustering

In a clustered environment, you also need to start the cluster manager daemon, e.g.

xrootd -c configFile.cfg /path/to/data &
cmsd -c configFile.cfg /path/to/data &

Alternatively,

cmsd -b -l /path/to/log/cmsd.log -c client.cfg
xrootd -b -l /path/to/log/xrootd.log -c client.cfg

...in which case the log files are explicitly defined on the command line (as opposed to the default stderr) and the processes are run as daemons.In this example the exported pat is defined in the config file so there is no need to put it in the command line.

The data in the cluster is exposed through the manager node, whose address is to be used in queries. Example:

xrdcp -f xroot://managerIP//my/path/foo local_foo

The file "foo" will be located and if it exists, will be copied to "local_foo" on the machine running the xrdcp client. Caveat: if multiple files exist in the system under the same path, the result (i.e. which one gets fetched) is random.

Configuration File

An example of a working configuration file suitable for a server node (not for the manager node):

all.role server
all.export /path/to/data
all.manager 192.168.0.191:3121
xrd.port 1094
acc.authdb /path/to/data/auth_file

In the example above the IP address for the manager needs to be set correctly, it's arbitrary in this sample.

authdb

The "authdb" bit is important, things mostly won't work without proper authorization (quite primitive in this case as it relies on a file with permissions). If all users are given access to all data, the content of the file can be as simple as

u * /path/to/data lr

Redirector

The redirector coordinates the function of the cluster. For example, it finds the data based on the path given by the clients such as xrdcp, without the client having to know which nodes contains this bit of data. A crude (but working) example of the redirector configuration (for cmsd):

all.manager managerIP:3121
all.role manager
xrd.port 3121
all.export /path/to/data
acc.authdb /path/to/data/auth_file

Note the port number. This is not the data port but the service port to used for communication inside the cluster (e.g. for metadata).

The configuration of the server (i.e. for xrootd) might look like this:

all.manager managerIP:3121
all.role manager
xrd.port 1094
all.export /path/to/data
acc.authdb /path/to/data/auth_file

The simplest way to initialize the redirector service on this node is as follows

xrootd -c server.cfg /path/to/data &
cmsd -c redir.cfg /path/to/data &

xrdfs

File Info

Filesystem functionality. Example:

xrdfs managerIP ls -l /my/path
xrdfs managerIP ls -u /my/path

In the above the first item performs similarly to "ls -l" in Linux shell, the second prints URLs of the files.

The following command locates the path, i.e. returns the address(es) of the server(s) which physically hold(s) the path - can be multiple machines:

xrdfs managerIP locate /my/path

Adding the "-r" option will force the server to refresh, i.e. to do a fresh query. Otherwise, a cached result will be used if it exists.

The "stat" command provides info similar to "stat":

xrdfs managerIP stat /my/path

The "rm" command does what the name suggest, with the usual caveat that if same path is present on a few machines, the result will be arbitrary - one of the files will be deleted at a time.

Host Info

xrdfs hostIP query config role

Checksum

XRootD hosts can report checksums for files, with a few checksum algorithms available. To enable this on a host a special line needs to be added to the configuration file, for example:

xrootd.chksum md5

As usual, it is only necessary to query the redirector in order to get this info by the xrdfs client:

xrdfs managerIP query checksum /my/path/to/file

Hardware Options