Difference between revisions of "Linux Tools"
(→Shell) |
|||
Line 249: | Line 249: | ||
==Shell== | ==Shell== | ||
+ | |||
+ | White space when using "sed": | ||
+ | <pre> | ||
+ | $ sed -e "s/\s\{3,\}/ /g" inputFile | ||
+ | will substitute every sequence of at least 3 whitespaces with two spaces. | ||
+ | </pre> | ||
+ | |||
Produce a convenient timestamp for various uses: | Produce a convenient timestamp for various uses: | ||
<pre> | <pre> |
Revision as of 20:47, 23 March 2018
Contents
About this page
This page is a collection of various and often unrelated bits of information available elsewhere but kept here for quick reference and occasionally useful in building a functional system in the Linux environment.
Django, Apache and PostgreSQL
- Moved to a separate Django page.
- Moved to a separate Apache page.
- Moved to a separate PostgreSQL page.
Information on these pages may be important when bringing the system up after a reboot or an upgrade, for example the SELinux settings are likely to be set to default which may prevent Apache from having access to certain files even thought the permissions appear to be correct.
Python
"Alternatives" (caveat emptor)
At the time of writing the system version of Python is often 2.7, whereas newer applications benefit from using Python 3.*. One way to deal with that is to include "env" in hashbang pointing to the exact version you want to use. Apache/WSGI deployments may require additional footwork to ensure the correct version of Python runtime is used in mod_wsgi etc.
Debian "Alternatives" - Debian has a way to specify the default version of an app. For example, if more than one version of Python is present on the system, the command "update-alternatives" can be used to activate any of the available choices.
Caution - it's not a good idea to switch from the version of Python which came with your distro, since there documented and undocumented dependencies in various places, on that particular version. Random things may break such as software update, applications like Dropbox etc. Caveat Emptor.
Remove an alternative version:
sudo update-alternatives --remove python /usr/bin/python3
Example above allows to fall back on the previous version, such as Python 2.7.
It is recommended that instead of replacing the default, relevant scripts contain explicit reference to version 3+ if possible.
Building Python from source
Certain applications (e.g. mod_wsgi) require Python to use shared libraries. Python (like 3.5) needs to be rebuilt for that:
./configure --enable-shared make altinstall
If you expect that your applications will have a dependency on sqlite, another option must be added:
--enable-loadable-sqlite-extensions
...which should be done after
sudo yum install sqlite-devel
Building from source may also be required for other reasons. If you got your system completely bare, you will need to install gcc before you can compile Python.
Please see notes below to understand what you need to do to ensure that the Python you are building has requisite support for ssl, zlib etc.
In user space:
./configure --prefix=$HOME/python make && make install
pip3
The pip utility most often needs to be run under "sudo". There are some issues with that as explained below.
Certain versions of sudo (on some Linux distributions) "reset the environment" in order to assure security. Most variables are unset. This may make installation work cumbersome. Policies that govern that are contained in the file /etc/sudoers. CAUTION - it should really only be edited with the "visudo" utility which checks for syntax. If that file becomes invalid you may lose all of sudo functionality which in some cases is the only way to have access to root privileges. This will effectively "brick" the system. Then, there are exceptions to rule of preserving certain variables even if you do edit the "sudoers" file. The variable LD_LIBRARY_PATH is notoriously clobbered no matter what you try. The way around it is to supply the value on the command line, and more than one can be included. Example:
sudo LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python3.5 get-pip.py
There are caveats when running the command listed above. Zlib is needed, so
sudo yum install zlib-devel
Also, it needs ssl support in Python. So after the following steps (on CentOS)
sudo yum install openssl sudo yum install openssl-devel
...one has to rebuild Python as described in the previous section.
Filesystem
Absolute path
import os os.path.abspath("../myfile.txt")
shutil
The "shutil" package contains a plethora of useful methods to copy and modify files. Watch out for file metadata, this may be subtle. For example, there are differences between "copy" and "copyfile".
Remote Access and Execution
Overview
It is convenient to control a few machines from a single host. Typically ssh is used for this purpose, but if security is not a concern (e.g. then the network is strictly local) telnet can be also used as a quick solution. It will also server to "bootstrap" ssh connectivity i.e. debug ssh configuration remotely to make it operational.
Among advantages of ssh is X11 forwarding, which functionality telnet does not have.
ssh
Installation and keys
You'll need to run the sshd service on every machine you want to connect to. On Linux, this is most frequently openssh-server and it can be trivially installed. Make sure there is a ssh entry in /etc/services, with the desired port number.
To be used productively, private and public keys will need to be generated or imported as necessary. For the private/public key pair to work, public keys should be added to the file ".ssh/authorized_keys". A matching private key must be loaded to an identity managing service (e.g. ssh-agent in case of Linux) on the machine from which you are going to connect. If it's not cached, you will likely be prompted to enter the passphrase for the key.
Typically (this depends on the flavor of your sshd) you will get a message specifying which public key is used during the login that you are attempting. This is useful to know if you have many keys and forget which was used for what connection.
Restarting the service:
sudo systemctl restart ssh
Adding a key to the agent:
eval "$(ssh-agent -s)" ssh-add key_file
You can also check which keys are loaded
ssh-add -l
In case of problems while connecting, it may be helpful to check the log on the ssh server machine: /var/log/auth.log.
Gateways such as one operating at BNL and other Labs typically require that your public key would be uploaded and cached on their side in advance. The exact way this can be done is site-dependent. Some sites require to verify the upload by providing the public key's fingerprint. Example of how to get it:
ssh-keygen -E md5 -lf my_public_key_file
If you lost your public key (while still having your private one) you can re-create it:
ssh-keygen -yf my_private_key_file
Once it's done, a connection becomes possible, for example:
ssh username@atlasgw.usatlas.bnl.gov
The '-X' option is needed to enable X11 forwarding in a connection established in this manner.
Tunnels
Using proxies at BNL:
ssh -L 8080:130.199.23.54:3128 yourAccount@your.gateway.bnl.gov
The port 8080 is chosen as an example - it must be a number larger than a certain lower limit to satisfy a security policy. On your local machine, you would need to specify a proxy which looks like this:
localhost:8080
Another example when going from one Linux box to another:
ssh -L 8000:localhost:8000 myRemoteHost
The above gives you access to the remote port 8000 on the local machine via localhost:8000. For example, this wrks for accessing the neutdqm machine via http:
ssh -L 8008:neutdqm.cern.ch:8008 user@lxplus015.cern.ch
If there is a need to access a HTTPS site, port number 443 needs to be forwarded, and if there is a certificate issue it needs to be resolved either in the browser, or, if wget is used, by applying the --no-check-certificate option.
Passwords
There are cases when key-based auth is not suitable and one has to use passwords with ssh. To automate logging in one may choose to install and use the "sshpass" utility, provided the credentials you supply are not stored in the open.
telnet
While using ssh is in general preferable for many reasons and foremost due to security concerns, sometimes there is a chicken and an egg problem where you need to establish access fast in order to debug ssh on a remote machine. In these cases, and if security is not a concern (rare, but could happen on an entirely internal network), one may opt to use telnet.
On Ubuntu one can install the software necessary to run the telnet service in the following manner:
sudo apt-get install xinetd telnetd
Make sure there is an entry in /etc/services which looks like
telnet 23/tcp
Also, create a file /etc/xinetd.d/telnet with contents similar to this:
service telnet { disable = no flags = REUSE socket_type = stream wait = no user = root server = /usr/sbin/in.telnetd log_on_failure += USERID HOST log_on_success += PID HOST EXIT log_type = FILE /var/log/xinetd.log }
...and start the service as follows:
sudo /etc/init.d/xinetd start
pdsh
This is an advanced parallel shell designed for cluster management. It often uses ssh as the underlying protocol although there are other options as well. Configuration is defined by files residing in /etc/pdsh. For example, the file "machines" needs to contain the list of computers to be targeted by pdsh. Optionally, this is also the place for a file that can be sourced for convenience of setup, cf
# setup pdsh for cluster users export PDSH_RCMD_TYPE='ssh' export WCOLL='/etc/pdsh/machines'
This of course can be done from the command line anyway, cf
export PDSH_RCMD_TYPE=ssh
Using ssh as the underlying protocol for pdsh implies that you have set up private and public keys just like you normally would for ordinary ssh login. Once this is done, you should be able to do something like this as a basic test of your setup:
pdsh -w targetHost "ls"
If the targetHost is omitted, the command will be run against all machines listed in the "machines" file as explained above. Should a command fail on a particular machine, this will be indicated (with an error code) in the output of the command, with the name of the machine listed. Redirection of stderr with something like "2>/dev/null" included with the command you run won't work with pdsh.
Example of installation on CentOS:
yum install pdsh
Java
java -XshowSettings:properties -version
Miscellania
Network
"nslookup" is a useful network information utility with diverse functionality. One simple function is to translate qualified host names to IP addresses and back.
"sha" headers one may need while installing xrootd can be obtained by running (on Ubuntu):
sudo apt-get install libssl-dev
...or as follows on CentOS
sudo yum install openssl openssl-devel
libssl may be necessary also for installation of pip3 etc.
A few other dependencies of xrootd can be met by installing glib2.0.
Shell
White space when using "sed":
$ sed -e "s/\s\{3,\}/ /g" inputFile will substitute every sequence of at least 3 whitespaces with two spaces.
Produce a convenient timestamp for various uses:
date -d "today" +"%Y%m%d%H%M"
To get timestamps in history:
HISTTIMEFORMAT="%d/%m/%y %T "
"find"
find . -maxdepth 1 -mmin +400
'mmin' means it accepts minutes, 'mtime' days.
Find and recurcively delete directories modified more than 5 hours ago:
find . -maxdepth 1 -mindepth 1 -mmin +300 -exec rm -fr {} \;
If you don't specify 'mindepth', the current directory will show up in the results and will be deleted in the case presented above.
"cksum" - calculates CRC and byte count.
Remove line breaks from a file:
echo $(cat $1)
Redirect stdout to one file and stderr to another file:
command > out 2>error
Redirect stderr to stdout (&1), and then redirect stdout to a file:
command >out 2>&1
Redirect both to a file:
command &> out
Find the name of the file, minus the complete path:
f=$(basename /home/maxim/JOB.html) echo $f
Crontab
- minute (from 0 to 59)
- hour (from 0 to 23)
- day of month (from 1 to 31)
- month (from 1 to 12)
- day of week (from 0 to 6) (0=Sunday)
crontab -r # clear out your crontab crontab -l # list your crontab
C++
String Comparison
// comparing apples with apples #include <iostream> #include <string> int main () { std::string str1 ("green apple"); std::string str2 ("red apple"); if (str1.compare(str2) != 0) std::cout << str1 << " is not " << str2 << '\n'; if (str1.compare(6,5,"apple") == 0) std::cout << "still, " << str1 << " is an apple\n"; if (str2.compare(str2.size()-5,5,"apple") == 0) std::cout << "and " << str2 << " is also an apple\n"; if (str1.compare(6,5,str2,4,5) == 0) std::cout << "therefore, both are apples\n"; return 0; }
ROOT I/O
File modes:
- NEW or CREATE Create a new file and open it for writing, if the file already exists the file is not opened.
- RECREATE Create a new file, if the file already exists it will be overwritten.
- UPDATE Open an existing file for writing. If no file exists, it is created.
- READ Open an existing file for reading (default).
Version Control (git)
Starting out
Notify git of your identity and ID:
git config --global user.email "yourname@yoursite.yourdomain" git config --global user.name yourID
To avoid entering git userID and password:
git config --global credential.helper 'cache --timeout 7200'
To address the usual "^M" problem when switching between Linux and Windows environments
$ git config --global core.autocrlf true # Remove everything from the index $ git rm --cached -r . # Re-add all the deleted files to the index # You should get lots of messages like: "warning: CRLF will be replaced by LF in <file>." $ git diff --cached --name-only -z | xargs -0 git add # Commit $ git commit -m "Fix CRLF"
(Also see https://stackoverflow.com/questions/1889559/git-diff-to-ignore-m)
Restoring Files
First, see this link:
https://stackoverflow.com/questions/953481/find-and-restore-a-deleted-file-in-a-git-repository
A recipe that may work well:
git log --diff-filter=D --summary # finds deleted files git checkout $commit~1 filename # where "$commit" stands for the actual commit name (a long string)
In the above, it's best to operate from the top level directory of the project and use path relative to that. Also, you may want to "git add" the restored files and commit them to make it permanent.
If you want to get a specific previous revision of a file, just capture the stdout of the following command:
git show $REV:$FILE
...and rename the output as you see fit.
Undoing a commit
See:
https://sethrobertson.github.io/GitFixUm/fixup.html
If you want to reverse your latest commit to the HEAD:
git reset --hard HEAD
To remove two last commits:
git reset --hard HEAD~2
gitHub quirks
Sometimes a cloned repo will end up in a state where you can't push local content. Things you might want to try:
git remote set-url origin https://buddhasystem@github.com/DUNE/dqmconfig.git
And in case it was not annoying enough, if you see something like "can't open display" this may help:
unset SSH_ASKPASS
LaTeX
One can choose to install all of tex packages or just a few:
apt install texlive texlive-humanities texlive-science
To see what is installed
dpkg -l
The little two-leter code at the front of each line says the status of the package. "ii" means installed and "rc" means removed but with config files still around ("dpkg --purge" or "apt-get remove --purge" gets rid of the "rc" but they are just harmless cruft).
HTCondor
It is often desirable to dynamically modify the content of the condor submit file (typically having the JDL extension). While it does not appear possible to access the shell environment variables within the submit file directly, a similar effect can be obtained by setting the internal HTCondor parameters on the command line, cf:
condor_submit A=100 foo.jdl
Then, one can access the value of "A" within the JDL file as $(A).
To find a number of idle jobs:
/usr/bin/condor_q 2>&1| tail -1 | cut -d' ' -f 7