Difference between revisions of "Linux Tools"

From DUNE
Jump to navigation Jump to search
Line 44: Line 44:
 
* Moved to a separate [[Apache]] page.
 
* Moved to a separate [[Apache]] page.
 
* Moved to a separate [[PostgreSQL]] page.
 
* Moved to a separate [[PostgreSQL]] page.
 
=Apache=
 
==Installation==
 
On Ubuntu:
 
<pre>
 
sudo apt-get install apache2 # Apache
 
sudo apt-get install libapache2-mod-wsgi-py3 # mod_wsgi for Python3
 
</pre>
 
 
==Start-Stop-Restart==
 
===Ubuntu===
 
To start/stop/restart Apache 2 web server, enter one of the commands in each category:
 
<pre>
 
### START
 
/etc/init.d/apache2 start
 
sudo /etc/init.d/apache2 start
 
sudo service apache2 start
 
### STOP
 
/etc/init.d/apache2 stop
 
sudo /etc/init.d/apache2 stop
 
sudo service apache2 stop
 
### RESTART
 
/etc/init.d/apache2 restart
 
sudo /etc/init.d/apache2 restart
 
sudo service apache2 restart
 
</pre>
 
 
System status:
 
<pre>
 
systemctl status apache2.service
 
</pre>
 
 
 
===CentOS/RH===
 
On RedHat Linux, the name of the daemon is httpd.
 
Also, "service" command may be aliased to systemctl.
 
<pre>
 
systemctl status -l httpd.service # or:
 
sudo systemctl start httpd.service
 
</pre>
 
 
==Apache Configuration==
 
===General Items===
 
KeepAlive sets the tradeoff between memory and CPU usage by Apache.
 
 
Serving static files:
 
https://docs.djangoproject.com/en/1.10/howto/deployment/wsgi/modwsgi/#serving-files
 
 
Official Layout of the Config Files:
 
https://wiki.apache.org/httpd/DistrosDefaultLayout
 
This, however, is not written in stone. Some details
 
are given below.
 
 
===Ubuntu===
 
See /etc/apache2/apache2.conf
 
 
Snippet from 000-default.conf on Ubuntu:
 
<pre>
 
        ServerName promptproc
 
        ServerAlias promptproc
 
 
 
        WSGIScriptAlias / /home/maxim/projects/p3s/promptproc/promptproc/wsgi.py
 
 
        Alias /static/ /var/www/static/
 
        <Directory /var/www/static>
 
        Require all granted
 
        </Directory>
 
 
        <Directory /home/maxim/projects/p3s/promptproc/promptproc>
 
        <Files wsgi.py>
 
        Require all granted
 
        </Files>
 
        </Directory>
 
 
</pre>
 
The "static directory must contain static content such as themes for the tables2 package.
 
Keep in mind that while this is served automatically by the Django development server,
 
it's not the case under Apache.
 
 
 
The file wsgi.conf needs to contain a reference to Python runtime like:
 
<pre>
 
WSGIPythonPath /home/maxim/.local/lib/python3.5/site-packages
 
</pre>
 
 
=== CentOS ===
 
See /etc/httpd/. Examples:
 
 
<pre>
 
[mxp@neutdqm p3s]$ ls /etc/httpd/
 
conf  conf.d  conf.modules.d  logs  modules  run
 
[mxp@neutdqm p3s]$ ls /etc/httpd/conf.d/
 
autoindex.conf  django.conf  php.conf  README  userdir.conf  welcome.conf
 
[mxp@neutdqm p3s]$ ls /etc/httpd/conf/
 
httpd.conf  magic
 
</pre>
 
 
===Permissions===
 
In addition to granting permissions in the Apache configuration file (an example is given below), correct permissions need to be set for the directory tree containing wsgi.py and other crucial files. If for example the tree is contained in your home directory and it's not readable to others, it won't work. One example (perhaps not the best) of how to make it work is to set 755 to your home dir.
 
 
On top of that, SELinux will impose it's own restriction. See:
 
<pre>
 
getenforce
 
</pre>
 
 
If it shows "Enforcing", try
 
<pre>
 
sudo setenforce 0
 
</pre>
 
 
===mod_wsgi===
 
* When using mod_wsgi one has to make sure the version matches the Python version, this needs to be specified when mod_wsgi is installed (see "Installation" above). You can use "ldd" on mod_wsgi.so to check dependencies including python version required. There is a possibility that mod_wsgi you installed has a long library name containing various metadata, and there is also still an older mod_wsgi file that get loaded instead. This needs to be taken care of.
 
* https://www.sitepoint.com/deploying-a-django-app-with-mod_wsgi-on-ubuntu-14-04/
 
* Methods of setting up the environment for wsgi described in the current Django documentation may or may not work on a particular installation/release/distro affiliation of Apache due to a few subtle bugs and relative complexity of *.conf and related files
 
* If you decide to build mod_wsgi from source, make sure your Python was also built from source with "./config -enable-shared" option
 
* It's easy to miss the fact that one segment of the path leading to wsgi.py doesn't have the right permissions, while httpd is run by user apache (or similar)
 
 
If you are willing to brave building mod_wsgi from source, here is a template:
 
<pre>
 
wget -q "https://github.com/GrahamDumpleton/mod_wsgi/archive/4.4.21.tar.gz"
 
tar -xzf '4.4.21.tar.gz'
 
cd ./mod_wsgi-4.4.21
 
./configure --with-python=/usr/local/bin/python3.5
 
make
 
make install
 
</pre>
 
 
===Ports and Firewalls===
 
SELinux may prevent you from configuring Apache with a non-standard port.
 
Useful commands:
 
<pre>
 
semanage port -l # list ports
 
semanage port -a -t http_port_t -p tcp 81 # add a rule
 
</pre>
 
 
List ports
 
<pre>
 
sudo nmap -sT -O localhost
 
# or
 
sudo lsof -i
 
</pre>
 
 
In addition to that, CentOS "may" have firewall settings which are beyond and above what you can learn with the tools listed above. See http://ask.xmodulo.com/open-port-firewall-centos-rhel.html.
 
To check the firewall rules:
 
<pre>
 
$ sudo iptables -L
 
</pre>
 
 
To open port 80 permanently:
 
<pre>
 
$ sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
 
$ sudo firewall-cmd --reload
 
</pre>
 
 
==Deploying DB for Django==
 
===sqlite permissions===
 
Assuming you are using sqlite, the file permissions on the DB file do matter if when you deploy under Apache.
 
So you either need to set wide permissions (may not be a good idea depending on the security situation) or
 
change the owner to "www-data" (on Ubuntu) or "apache" (on CentOS). Other OS may require similar tweaks.
 
 
===PostgreSQL===
 
An example of the "settings.py" clause:
 
<pre>
 
DATABASES = {
 
    'default': {
 
        'ENGINE': 'django.db.backends.postgresql',
 
        'NAME': 'foo',
 
        'USER': 'bar',
 
        'PASSWORD': '***',
 
        'HOST': '',
 
        'PORT': '',
 
    }
 
}
 
</pre>
 
  
 
=Misc Tools=
 
=Misc Tools=

Revision as of 01:20, 1 July 2017

Intro

This page is a collection of (hopefully) useful information and trivia which may be required to build a Web service based on Django/Apache/PostgreSQL and to manage a small pool of machines for testing purposes.

Python

"Alternatives"

At the time of writing the system version of Python is often 2.7, whereas newer applications benefit from using Python 3.*. One way to deal with that is to include "env" in hashbang pointing to the exact version you want to use. Apache/WSGI deployments may require additional footwork to ensure the correct version of Python runtime is used in mod_wsgi etc.

Debian "Alternatives" - Debian has a way to specify the default version of an app. For example, if more than one version of Python is present on the system, the command "update-alternatives" can be used to activate any of the available choices.

Caution - it's not a good idea to switch from the version of Python which came with your distro, since there documented and undocumented dependencies in various places, on that particular version. Random things may break such as software update, applications like Dropbox etc. Caveat Emptor.

Remove an alternative version:

sudo update-alternatives --remove python /usr/bin/python3

Example above allows to fall back on the previous version, such as Python 2.7.

It is recommended that instead of replacing the default, relevant scripts contain explicit reference to version 3+ if possible.

"Enable Shared"

Certain applications require Python to use shared libraries. Python (like 3.5) needs to be rebuilt for that:

./configure --enable-shared
make altinstall

pip3

The pip utility most often needs to be run under "sudo". There are some issues with that as explained below.

Certain versions of sudo (on some Linux distributions) "reset the environment" in order to assure security. Most variables are unset. This may make installation work cumbersome. Policies that govern that are contained in the file /etc/sudoers. CAUTION - it should really only be edited with the "visudo" utility which checks for syntax. If that file becomes invalid you may lose all of sudo functionality which in some cases is the only way to have access to root privileges. This will effectively "brick" the system. Then, there are exceptions to rule of preserving certain variables even if you do edit the "sudoers" file. The variable LD_LIBRARY_PATH is notoriously clobbered no matter what you try. The way around it is to supply the value on the command line, and more than one can be included. Example:

sudo LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python3.5 get-pip.py

Django, Apache and PostgreSQL

Misc Tools

ssh, telnet and other access methods

It is convenient to control a few machines from a single host. Typically ssh is used for this purpose, but if security is not a concern (e.g. then the network is strictly local) telnet can be also used as a quick solution. It will also server to "bootstrap" ssh connectivity i.e. debug ssh configuration remotely to make it operational.

Among advantages of ssh is X11 forwarding, which functionality telnet does not have.

ssh

You'll need to run the sshd service on every machine you want to connect to. On Linux, this is most frequently openssh-server and it can be trivially installed. Make sure there is a ssh entry in /etc/services, with the desired port number.

To be used productively, private and public keys will need to be generated or imported as necessary. For the private/public key pair to work, public keys should be added to the file ".ssh/authorized_keys". A matching private key must be loaded to an identity managing service (e.g. ssh-agent in case of Linux) on the machine from which you are going to connect. If it's not cached, you will likely be prompted to enter the passphrase for the key.

Typically (this depends on the flavor of your sshd) you will get a message specifying which public key is used during the login that you are attempting. This is useful to know if you have many keys and forget which was used for what connection.

Restarting the service:

sudo systemctl restart ssh

Adding a key to the agent:

eval "$(ssh-agent -s)"
ssh-add key_file

You can also check which keys are loaded

ssh-add -l

Gateways such as one operating at BNL and other Labs typically require that your public key would be uploaded and cached on their side in advance. The exact way this can be done is site-dependent. Some sites require to verify the upload by providing the public key's fingerprint. Example of how to get it:

ssh-keygen -E md5 -lf my_public_key_file

If you lost your public key (while still having your private one) you can re-create it:

ssh-keygen -yf my_private_key_file

Once it's done, a connection becomes possible, for example:

ssh username@atlasgw.usatlas.bnl.gov

The '-X' option is needed to enable X11 forwarding in a connection established in this manner.

Tunneling at BNL:

ssh -L 8080:130.199.23.54:3128 yourAccount@your.gateway.bnl.gov

The port 8080 is chosen as an example - it must be a number larger than a certain lower limit to satisfy a security policy. On your local machine, you would need to specify a proxy which looks like this:

localhost:8080

Another example when going from one Linux box to another:

ssh -L 8000:localhost:8000 myRemoteHost

The above gives you access to the remote port 8000 on the local machine via localhost:8000. Another example which works for accessing the neutdqm machine via http:

ssh -L 8008:neutdqm.cern.ch:8008 user@lxplus015.cern.ch

If there is a need to access a HTTPS site, port number 443 needs to be forwarded, and if there is a certificate issue it needs to be resolved either in the browser, or, if wget is used, by applying the --no-check-certificate option.

telnet

While using ssh is in general preferable for many reasons and foremost due to security concerns, sometimes there is a chicken and an egg problem where you need to establish access fast in order to debug ssh on a remote machine. In these cases, and if security is not a concern (rare, but could happen on an entirely internal network), one may opt to use telnet.

On Ubuntu one can install the software necessary to run the telnet service in the following manner:

sudo apt-get install xinetd telnetd

Make sure there is an entry in /etc/services which looks like

telnet        23/tcp

Also, create a file /etc/xinetd.d/telnet with contents similar to this:

service telnet {    
        disable         = no
        flags           = REUSE
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/sbin/in.telnetd
        log_on_failure  += USERID HOST
        log_on_success  += PID HOST EXIT
        log_type        = FILE /var/log/xinetd.log
}

...and start the service as follows:

sudo /etc/init.d/xinetd start

pdsh

This is an advanced parallel shell designed for cluster management. It often uses ssh as the underlying protocol although there are other options as well. Configuration is defined by files residing in /etc/pdsh. For example, the file "machines" needs to contain the list of computers to be targeted by pdsh. Optionally, this is also the place for a file that can be sourced for convenience of setup, cf

# setup pdsh for cluster users
export PDSH_RCMD_TYPE='ssh'
export WCOLL='/etc/pdsh/machines'

This of course can be done from the command line anyway, cf

export PDSH_RCMD_TYPE=ssh

Using ssh as the underlying protocol for pdsh implies that you have set up private and public keys just like you normally would for ordinary ssh login. Once this is done, you should be able to do something like this as a basic test of your setup:

pdsh -w targetHost "ls"

If the targetHost is omitted, the command will be run against all machines listed in the "machines" file as explained above. Should a command fail on a particular machine, this will be indicated (with an error code) in the output of the command, with the name of the machine listed. Redirection of stderr with something like "2>/dev/null" included with the command you run won't work with pdsh.

Example of installation on CentOS:

yum install pdsh

Misc

"nslookup" is a useful network information utility with diverse functionality. One simple function is to translate qualified host names to IP addresses and back.

"sha" headers one may need while installing xrootd can be obtained by running (on Ubuntu):

sudo apt-get install libssl-dev

...or as follows on CentOS

sudo yum install openssl openssl-devel

libssl may be necessary also for installation of pip3 etc.

A few other dependencies of xrootd can be met by installing glib2.0.

Version Control

Notify git of your identity:

git config --global user.email "yourname@yoursite.yourdomain"

To avoid entering git userID and password:

git config --global credential.helper 'cache --timeout 7200'

To address the usual "^M" problem when switching between Linux and Windows environments

$ git config --global core.autocrlf true
# Remove everything from the index
$ git rm --cached -r .

# Re-add all the deleted files to the index
# You should get lots of messages like: "warning: CRLF will be replaced by LF in <file>."
$ git diff --cached --name-only -z | xargs -0 git add

# Commit
$ git commit -m "Fix CRLF"

(Also see https://stackoverflow.com/questions/1889559/git-diff-to-ignore-m)

LaTeX

One can choose to install all of tex packages or just a few:

apt install texlive texlive-humanities texlive-science

To see what is installed

dpkg -l

The little two-leter code at the front of each line says the status of the package. "ii" means installed and "rc" means removed but with config files still around ("dpkg --purge" or "apt-get remove --purge" gets rid of the "rc" but they are just harmless cruft).