Category Archives: Troubleshooting Errors

How to troubleshoot RPC: Port mapper failure – Timed out error

Learn how to troubleshoot RPC: Port mapper failure – Timed out error on NFS client. This will help you to resolve NFS mounting being timed out issue.

RPC: Port mapper failure – Timed out

In this article, we are going to discuss the troubleshooting of one of the NFS errors you see on NFS clients. This error can be seen while trying commands related to NFS like below :

root@kerneltalks # showmount -e mynfsserver
clnt_create: RPC: Port mapper failure - Timed out

root@kerneltalks # rpcinfo -p  mynfsserver
mynfsserver: RPC: Port mapper failure - Timed out

Normally when you see this error you are not able to mount NFS share as well. You will see mount.nfs: Connection timed out error when you try to mount NFS share.

root@kerneltalks # mount mynfsserver:/data /nfs_data
mount.nfs: Connection timed out

Troubleshooting steps

Follow below troubleshooting steps to fix RPC: Port mapper failure - Timed out error.

Check NFS services on NFS server

First, check if NFS server services are running smoothly on the NFS server.

root@mynfsserver # service nfs-server status
nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled)
  Drop-In: /usr/lib/systemd/system/nfs-server.service.d
           └─nfsserver.conf
        /run/systemd/generator/nfs-server.service.d
           └─order-with-mounts.conf
   Active: active (exited) since Tue 2017-11-07 15:58:08 BRST; 6 days ago
 Main PID: 1586 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-server.service

The above output is from the Suse Linux server. The output may look different n different Linux distros. If it’s not running or hung, you may need to restart NFS services.

Check connectivity between NFS server and client

Make sure you are able to reach the NFS server from your client. Check using ping and telnet to NFS ports like 111 and 2049 over both protocols TCP and UDP.

root@kerneltalks #  ping mynfsserver
PING lasnfsp01v.la.holcim.net (10.186.1.22) 56(84) bytes of data.
64 bytes from 10.186.1.22: icmp_seq=1 ttl=56 time=3.92 ms
64 bytes from 10.186.1.22: icmp_seq=2 ttl=56 time=3.74 ms
64 bytes from 10.186.1.22: icmp_seq=3 ttl=56 time=3.82 ms
^C
--- mynfsserver ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 3.748/3.830/3.920/0.086 ms

root@kerneltalks # telnet 10.186.1.22 2049
Trying 10.186.1.22...
Connected to 10.186.1.22.
Escape character is '^]'.
root@kerneltalks # nc -v -u mynfsserver 111
Connection to mynfsserver 111 port [udp/sunrpc] succeeded!
^C
root@kerneltalks # nc -v -u mynfsserver 2049
Connection to mynfsserver 2049 port [udp/nfs] succeeded!
^C
root@kerneltalks # nc -v mynfsserver 111
Connection to mynfsserver 111 port [tcp/sunrpc] succeeded!
^C
root@kerneltalks # nc -v mynfsserver 2049
Connection to mynfsserver 2049 port [tcp/nfs] succeeded!
^C
Check if RPC info is reachable from client

Run below command to check if you can read RPC information of the NFS server from the client machine.

root@kerneltalks # rpcinfo -p 10.186.1.22
   program vers proto   port  service
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100005    2   udp  20048  mountd
    100005    2   tcp  20048  mountd
    100005    3   udp  20048  mountd
    100005    3   tcp  20048  mountd
    100024    1   udp   4000  status
    100024    1   tcp   4000  status
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    3   udp   2049  nfs_acl
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr

If you have connectivity issue then you will see mynfsserver: RPC: Port mapper failure - Timed out error here.

Check if you can read exported share info from client

Use the below command to check if you can read exported share info from the client.

root@kerneltalks # showmount -e  10.186.1.22
Export list for  10.186.1.22:
/data *(rw,sync,no_root_squash)
Check if antivirus kernel modules are blocking the NFS

Lastly, if you have SEP 14 (Symantec Endpoint Protection) antivirus installed on your server then you need to uninstall it. For some mysterious reason, SEP 14 holds on nfsd and crashes everything to NFS. You may see below messages in dmesg of NFS server to verify if SEP kernel modules are messing up with NFS

symev_custom_4_12_14_95_48_default_x86_64: loading out-of-tree module taints kernel.
symev_custom_4_12_14_95_48_default_x86_64: module verification failed: signature and/or required key missing - tainting kernel
symap_custom_4_12_14_95_48_default_x86_64: module license 'Proprietary' taints kernel.
symev: hold nfsd module
BUG: unable to handle kernel paging request at ffffffffc068f8b4
IP: svc_tcp_accept+0x1a6/0x320 [sunrpc]

You need a reboot after uninstalling antivirus since its kernel modules loaded in kernel won’t get freed with uninstall. For that, you need to reboot the server. You cant even remove it from the kernel as you normally remove the module from the running kernel. So reboot is a way to go after uninstalling antivirus.

root@kerneltalks # lsmod |grep sym
symev_custom_4_12_14_95_48_default_x86_64    98304  1

root@kerneltalks # modprobe -r symev_custom_4_12_14_95_48_default_x86_64
modprobe: FATAL: Module symev_custom_4_12_14_95_48_default_x86_64 is in use.

root@kerneltalks # rmmod  symev_custom_4_12_14_95_48_default_x86_64
rmmod: ERROR: Module symev_custom_4_12_14_95_48_default_x86_64 is in use

root@kerneltalks # lsmod | grep sym
symev_custom_4_12_14_95_48_default_x86_64    98304  1

root@kerneltalks # modinfo symev_custom_4_12_14_95_48_default_x86_64
filename:       /lib/modules/4.12.14-95.48-default/kernel/drivers/char/symev-custom-4-12-14-95-48-default-x86-64.ko
modinfo: ERROR: could not get modinfo from 'symev_custom_4_12_14_95_48_default_x86_64': No such file or directory

Once all the above commands are able to provide you expected output, you will then be ready to mount your share without any issues and the issue will be resolved.


How to resolve connectivity issue

To resolve connectivity between two servers first you need to check on network ends that two servers are able to communicate over a network. If you are running it on AWS Linux EC2 instances then you might need to check security groups to allow proper traffic.

On the OS front, you may need to check iptables settings and allow NFS ports. SELinux is also an area where you need to explore settings if you have customized SELinux running on your server. Normally by default SELinux allows NFS traffic.

Access denied error in NFS for root account

Learn how to resolve access denied issues in the NFS mount point. Understand how to root access is limited in NFS and no_root_squash to be used.

Access Denied in NFS for root account

Current setup

Access denied error in NFS share mount points when attempted to create file or directory even if rw option is set while exporting.

I had a directory named mydata which is exported from the NFS server. My /etc/exports file looks like this –

root@kerneltalks # cat /etc/exports
/mydata     10.0.2.34(rw,sync)

I mounted it on the NFS client client1 successfully. I am able to read all data within this directory from the NFS client.

root@client1 # mount kerneltalks:/mydata /nfs_data
root@client1 # ls -lrt /nfs_data

Issue

I am not able to create a file or directory in the NFS mount even if rw option is set. I tried creating files, directory and I get access denied error.

root@client1 # cd /nfs_data

root@client1 # touch testfile
touch: cannot touch ‘testfile’: Access denied

root@client1 # mkdir testdir
mkdir: cannot create directory ‘testdir’: Access denied

Solution

By default, NFS prevents remote root users from gaining root-level privileges on its exports. It assigns user privileges of nfsnobody user  to remotely logged in root users. This is what happened here and hence even if rw option is set, since we are using mount at root user we are not able to write any data on export.

This is called squashing root privileges to the normal ones. This to ensure accidental writing or modifying data on exports. You can set all_squash option which will squash privileges of all remote users including root to normal user nfsnobody.

For our issue, we have to set no_root_squash option on export so that remote root user keeps his power intact and will be able to write on the exported directory.

I changed my /etc/exports as below :

root@kerneltalks # cat /etc/exports
/mydata     10.0.2.34(rw,sync,no_root_squash)

I re-exported directory using exportfs. Re-exporting mount points does not require the client to un-mount exported directories. Re-export also avoid the NFS server restart and catch up with new configuration.

root@kerneltalks # exportfs -ra

That’s it! Now I am able to create files and directories in the exported directory on NFS client.

root@client1 # cd /nfs_data
root@client1 # touch testfile
root@client1 # mkdir testdir

Conclusion

When you are using NFS mount points with root account on client-side then export them with no_root_squash option. This will ensure you don’t face access related issues on NFS mount points.

check_mk error Cannot fetch deployment URL via curl error

Article explaining ‘ERROR Cannot fetch deployment URL via curl: Couldn’t resolve host. The given remote host was not resolved.’ and how to resolve it.

check_mk register error

check_mk is a utility that helps you configure your server to be monitored via nagios monitoring tool. While configuring one of the clients I came across below error :

ERROR Cannot fetch deployment URL via curl: Couldn't resolve host. The given remote host was not resolved.

This error came after I tried to register the client with the monitoring server with below command :

root@kerneltalks # /usr/bin/cmk-update-agent register -s monitor.kerneltalks.com -i master -H `hostname` -p http -U omdadmin -S ASFKWEFUNSHEFKG -v

Here in this command –

-s is monitoring server
-i is Name of Check_MK site on that server
-H is Hostname to fetch agent for
-p is protocol Either HTTP or HTTPS (default is HTTPS)
-U  User-ID of a user who is allowed to download the agent.
-S is secret. Automation secret of that user (in case of automation user)
From the error, you can figure out that command is not able to resolve to monitor server DNS name monitor.kerneltalks.com

Solution

Its pretty simple. Check /etc/resolv.conf to make sure that you have proper DNS server entry for your environment. If it still doesn’t resolve the issue then you can add an entry in /etc/hosts for it.

root@kerneltalks # cat /etc/hosts
10.0.10.9 monitor.kerneltalks.com

Thats it. You would be able to register now successfully.

root@kerneltalks # /usr/bin/cmk-update-agent register -s monitor.kerneltalks.com -i master -H `hostname` -p http -U omdadmin -S ASFKWEFUNSHEFKG -v
Going to register agent at deployment server
Successfully registered agent for deployment.
You can now update your agent by running 'cmk-update-agent -v'
Saved your registration settings to /etc/cmk-update-agent.state.

By the way, you can directly use the IP address for -s switch and get rid of all the above jargon including error itself!

mount.nfs: requested NFS version or transport protocol is not supported

Troubleshooting error ‘mount.nfs: requested NFS version or transport protocol is not supported’ and how to resolve it. 

Resolve NFS error

Another troubleshooting article aimed at specific errors and help you how to solve it. In this article, we will see how to resolve error ‘mount.nfs: requested NFS version or transport protocol is not supported’ seen on NFS client while trying to mount NFS share.

# mount 10.0.10.20:/data /data_on_nfs
mount.nfs: requested NFS version or transport protocol is not supported

Sometimes you see error mount.nfs: requested NFS version or transport protocol is not supported when you try to mount NFS share on NFS client. There are couple of reasons you see this error :

  1. NFS services are not running on NFS server
  2. NFS utils not installed on the client
  3. NFS service hung on NFS server

NFS services at the NFS server can be down or hung due to multiple reasons like server utilization, server reboot, etc.

You might be interested in reading :

Solution 1:

To get rid of this error and successfully mount your share follow the below steps.

Login to the NFS server and check the NFS services status.

[root@kerneltalks]# service nfs status
rpc.svcgssd is stopped
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad is stopped

In the above output you can see the NFS services are stopped on the server. Start them.

[root@kerneltalks]# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]

You might want to check for nfs-server or nfsserver service as well depends on your Linux distro.

Now try to mount NFS share on the client. And you will be able to mount them using the same command we see earlier!

Solution 2 :

If that doesn’t work for you then try installing package nfs-utils on your server and you will get through this error.

Solution 3 :

Open file /etc/sysconfig/nfs and try to check below parameters

# Turn off v4 protocol support
#RPCNFSDARGS="-N 4"
# Turn off v2 and v3 protocol support
#RPCNFSDARGS="-N 2 -N 3"

Removing hash from RPCNFSDARGS lines will turn off specific version support. This way clients with mentioned NFS versions won’t be able to connect to the NFS server for mounting share. If you have any of it enabled, try disabling it and mounting at the client after the NFS server service restarts.

Let us know if you have faced this error and solved it by any other methods in the comments below. We will update our article with your information to keep it updated and help the community live better!

device eth0 does not seem to be present, delaying initialization error on Linux VM

Step by step procedure to resolve “device eth0 does not seem to be present, delaying initialization.” error on Linux VM.

eth0 error on Linux vm

If you are working on VMware infrastructure when your Linux VM is hosted on it, you must have come across below error while bringing up Ethernet in Linux :

Bringing up interface eth0: Device eth0 does not seem to be present, delaying initialization.

root@kerneltalks # service network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  Device eth0 does not seem to be present, delaying initialization.
                                                           [FAILED]

In this article, we are going to resolve this error. First, let’s see the background of this error. This method works well on Red Hat, CentOS, Oracle Linux, etc.

This error comes up normally in VM which is cloned from other Linux VM or from the template. The root cause of this error is MAC address of eth0 which is same as the source (source VM or template) in its configuration file ifcfg-eth0. While as during boot NIC gets unique MAC address and it does not match with one in ifcfg-eth0.

So here are steps to resolve this error.

Step 1.

Remove file /etc/udev/ rules.d/70-persistent-net.rules and reboot the server.

root@kerneltalks # rm /etc/udev/rules.d/70-persistent-net.rules
root@kerneltalks # reboot

This ensures new fresh files to be generated at the next boot and get unique MAC assigned.

Step 2.

After reboot, confirm above file is generated again. It will look like :

root@kerneltalks # cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x15ad:0x07b0 (vmxnet3)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:56:99:3f:25", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

Observe that new MAC is generated for eth1 (NAME="eth1"). Note down the MAC address from the file.

Step 3.

Now you have two choices :

  1. Use eth1 as device name under ifcfg-eth0 config file.
  2. Edit /etc/udev/ rules.d/70-persistent-net.rules with eth name as eth0

If you are choosing the first option then along with name change you need to change MAC.

root@kerneltalks # cat ifcfg-eth0
DEVICE=eth1
HWADDR=00:50:56:99:3f:25
TYPE=Ethernet
---- output truncated ----

If you are choosing the second option. Edit your ifcfg-eth0 located under /etc/sysconfig/network-scripts with new MAC address from above file. Also, since you made changes to /etc/udev/ rules.d/70-persistent-net.rules  you have to again reboot the server. Rebooting the server here is important.

root@kerneltalks # cat ifcfg-eth0
DEVICE=eth0
HWADDR=00:50:56:99:3f:25
TYPE=Ethernet
---- output truncated ----

I would suggest going with the second choice since you will maintain naming conventions on your system. The first choice may create confusion to fellow sysadmins if you have more than one NIC on your server.

Step 4.

If you have used choice 1 i.e. using device name as eth1 under config file ifcfg-eth0 then you just need to restart network service and you should be all set.

root@kerneltalks # service network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]

If you have opted the second choice then reboot already taken care of things and your ethernet along with IP should be post-boot.

How to unmount NFS when the server is offline

Learn how to unmount NFS when the server is gone. Dead NFS mounts can be un-mounted using forceful and lazy umount command.

Unmount NFS share when server is gone

This article will help you to un-mount NFS share from the client when the NFS server is gone or offline or un-available or decommissioned. We have seen how to configure the NFS server and how to handle NFS stale file error. But what if your NFS server is gone and its shares are still mounted on clients. Normally, before shutting down the NFS server, all clients should be notified and advised to unmount NFS shares they are using from this server.

But, in case if any of the clients have still NFS mounted when the server goes down, then the client should forcefully un-mount it. Normal mount operation won’t be effective in such cases. When the NFS server is down, you observe below things on the client who has NFS share still mounted.

  1. df command hangs since it tries to fetch NFS information but the NFS server is not responding.
  2. Tools, utilities who use/check mount point information like Ignite backup shows below error.
    NFS server xyz not responding still trying
  3. fuser command hangs when running for the NFS mount point.
  4. umount (normal) command fails with the below error.
    root@kerneltalks # umount /data nfs umount: nfs_unmount: /data: is busy umount: return error 1. 

In this case, you need to use forceful (-f switch) and lazy umount (-l switch) to un-mount this dead NFS mount point. Lazy un-mount detach the said mount point from file system tree and cleans its all references once it’s not busy anymore.

Lazy un-mount is available in most Linux distributions. If not, you should be fine with only forceful un-mount too. In HPUX lazy un-mount is not available.

root@kerneltalks # umount -f -l /data

Conclusion

You can identify dead NFS share by df, fuser commands that appear to hang, failing to umount command. Such a dead NFS mount point can be un-mounted using forceful and lazy umount command.

How to resolve aclocal: not found error in Ubuntu

Troubleshooting steps to clear out aclocal: not found error in Ubuntu. Install mentioned package dependencies and you will be all set.

Resolve aclocal: not found error in Ubuntu

Recently I faced an issue while installing s3fs utility. I saw an error below :

./autogen.sh: 38: ./autogen.sh: aclocal: not found

This was while executing autogen.sh script.

# ./autogen.sh
--- Make commit hash file -------
--- Finished commit hash file ---
--- Start autotools -------------
./autogen.sh: 38: ./autogen.sh: aclocal: not found
--- Finished autotools ----------

So one of the packages was missing which stopped the script from executing. We will walk through the process to resolve it.

First, you have to install autotools-dev package on your machine.

# apt-get install autotools-dev

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  autotools-dev
0 upgraded, 1 newly installed, 0 to remove and 35 not upgraded.
Need to get 39.8 kB of archives.
After this operation, 155 kB of additional disk space will be used.
Get:1 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 autotools-dev all 20150820.1 [39.8 kB]
Fetched 39.8 kB in 0s (104 kB/s)
Selecting previously unselected package autotools-dev.
(Reading database ... 56705 files and directories currently installed.)
Preparing to unpack .../autotools-dev_20150820.1_all.deb ...
Unpacking autotools-dev (20150820.1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up autotools-dev (20150820.1) ...

Once completed, proceed with installing automake package.

# apt-get install automake

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  autoconf m4
Suggested packages:
  autoconf-archive gnu-standards autoconf-doc libtool gettext
The following NEW packages will be installed:
  autoconf automake m4
0 upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 1,025 kB of archives.
After this operation, 3,781 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 m4 amd64 1.4.17-5 [195 kB]
Get:2 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 autoconf all 2.69-9 [321 kB]
Get:3 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 automake all 1:1.15-4ubuntu1 [510 kB]
Fetched 1,025 kB in 1s (920 kB/s)
Selecting previously unselected package m4.
(Reading database ... 56719 files and directories currently installed.)
Preparing to unpack .../archives/m4_1.4.17-5_amd64.deb ...
Unpacking m4 (1.4.17-5) ...
Selecting previously unselected package autoconf.
Preparing to unpack .../autoconf_2.69-9_all.deb ...
Unpacking autoconf (2.69-9) ...
Selecting previously unselected package automake.
Preparing to unpack .../automake_1%3a1.15-4ubuntu1_all.deb ...
Unpacking automake (1:1.15-4ubuntu1) ...
Processing triggers for install-info (6.1.0.dfsg.1-5) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up m4 (1.4.17-5) ...
Setting up autoconf (2.69-9) ...
Setting up automake (1:1.15-4ubuntu1) ...
update-alternatives: using /usr/bin/automake-1.15 to provide /usr/bin/automake (automake) in auto mode

This automake package will get you through success! After installing both of them I tried running script again and it was successful.

# ./autogen.sh
--- Make commit hash file -------
--- Finished commit hash file ---
--- Start autotools -------------
configure.ac:30: installing './compile'
configure.ac:26: installing './config.guess'
configure.ac:26: installing './config.sub'
configure.ac:27: installing './install-sh'
configure.ac:27: installing './missing'
src/Makefile.am: installing './depcomp'
parallel-tests: installing './test-driver'
--- Finished autotools ----------

Conclusion

To resolve aclocal: not found error, install autotools-dev and automake packages in Ubuntu. This will resolve your error.

11 log files you should see on your Linux system

Listing of important Linux log files and their formats. These logs play a vital role in troubleshooting and every sysadmin should be aware of them.

Important log files in Linux

Troubleshooting any issues with your system needs proper knowledge of log file structure and their locations. As a sysadmin you should know for which log file should be checked for a particular service issue. In this article we will walk through several hand-picked log files which are pretty much helpful for conducting preliminary analysis during troubleshooting. 10 log files we selected to discuss here are :

  1. /var/log/messages : General message and system related stuff
  2. /var/log/boot.log : Services success/failures at boot
  3. /var/log/secure or /var/log/auth.log : Authentication log
  4. /var/log/utmp or /var/log/wtmp : Login records
  5. /var/log/btmp  : Failed login records
  6. /var/log/cron : Scheduler log file
  7. /var/log/maillog : Mail logs
  8. /var/log/xferlog : File transfer logs
  9. /var/log/lastlog : Last login details
  10. dmesg : Device driver messages
  11. /var/crash logs : System crash dump

Lets see each log file one by one.

System consolidated log file : /var/log/messages

All system services which do not have their own special log file, normally write to this log file. Most of the system activity can be seen here hence its also called Syslog (system’s log). Every sysadmin first opens up this log when he starts troubleshooting! Sample log looks like below :

May 22 02:00:29 server1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="999" x-info="http://www.rsyslog.com"] exiting on signal 15.
May 22 02:00:29 server1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
May 22 02:00:29 server1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1698" x-info="http://www.rsyslog.com"] start
May 22 02:17:43 server1 dhclient[916]: DHCPREQUEST on eth0 to 172.31.0.1 port 67 (xid=0x445faedb)

File can be read from left as:

  1. Date
  2. Time
  3. System hostname
  4. Service name (and sometimes PID as well)
  5. Message text

System hostname stands value when you have a centralized Syslog server. Message text and service name help you narrow your search. You can directly grep on this file to find specific service-related logs omitting other clutter. All file operations like cat, more, less, etc can be done on this file since its plain text file.

Since this log fills very fast on a busy system (as it logs almost everything), you should consider configuring logrotate for it so that it won’t full your mount point.

Service success/failures at boot : /var/log/boot.log

At the time of booting Linux server, you can see services being started and their success or failure status is displayed on local console. The same logs can be obtained from the boot log post-boot. This file lists all service’s success/failure status at boot time so that it can be referred later to troubleshoot any service-related issues.

$ cat /var/log/boot.log
growroot: FAILED: GPT partition found but no sgdisk
                Welcome to Red Hat Enterprise Linux Server
Starting udev: udevd[404]: can not read 'https://z5.kerneltalks.com/etc/udev/rules.d/75-persistent-net-generator.rules'
udevd[404]: can not read 'https://z5.kerneltalks.com/etc/udev/rules.d/75-persistent-net-generator.rules'

                                                           [  OK  ]
Setting hostname ip-172-31-1-120.ap-south-1.compute.interna[  OK  ]
Setting up Logical Volume Management:                      [  OK  ]
Checking filesystems
/dev/xvda1: clean, 77980/393216 files, 811583/1572864 blocks
                                                           [  OK  ]
Remounting root filesystem in read-write mode:             [  OK  ]
Mounting local filesystems:                                [  OK  ]
Enabling local filesystem quotas:                          [  OK  ]
Enabling /etc/fstab swaps:                                 [  OK  ]
Entering non-interactive startup

The above sample, file shows services, their status (on right), and any error/warning messages written to console (by service daemons).

Authentication logs : /var/log/secure, /var/log/auth.log

This log file is crucial to check user access logs. Log files stores information on user logins along with authentication used. It also stores sudo logs.

Jun 14 23:41:00 server1 sshd[1586]: Accepted publickey for ec2-user from 59.184.130.135 port 51265 ssh2
Jun 14 23:41:01 server1 sshd[1586]: pam_unix(sshd:session): session opened for user ec2-user by (uid=0)
Jun 14 23:41:04 server1 sudo: ec2-user : TTY=pts/0 ; PWD=/home/ec2-user ; USER=root ; COMMAND=/bin/su -
Jun 14 23:41:04 server1 su: pam_unix(su-l:session): session opened for user root by ec2-user(uid=0)
Jun 14 23:43:45 server1 su: pam_unix(su-l:session): session closed for user root

Log file can be read from left to right as :

  1. Date
  2. Time
  3. Server hostname
  4. Authentication service or daemon (sometimes along with PID)
  5. Message

Login records : /var/log/utmp, /var/log/wtmp

These wtmp or utmp file stores user login details. utmp stores login details like id, time, duration, the terminal used, system reboot details, etc. wtmp provides historical utmp data.  These files are not plain text files and hence need to be parsed to last command to read data within. You can use last -f <path> to read file.

ec2-user pts/1        59.184.170.243   Mon Jun 19 07:24   still logged in
ec2-user pts/0        59.184.170.243   Mon Jun 19 07:21 - 07:24  (00:02)
reboot   system boot  2.6.32-696.el6.x Mon Jun 19 07:20 - 07:39  (00:18)
ec2-user pts/0        59.184.130.135   Wed Jun 14 23:41 - 00:09  (00:28)
reboot   system boot  2.6.32-696.el6.x Wed Jun 14 23:40 - 00:09  (00:29)

In above output of last -f /var/log/wtmp you can see from left to right :

  1. User or event (reboot)
  2. Terminal used
  3. IP from which user was connected
  4. Date/time of login
  5. Date /time of log out
  6. Duration of session

Failed login records : /var/log/btmp

Its file dedicated to logging only failed login attempts. This too can be read using last command as mentioned above.

user1    ssh:notty    31.207.47.50     Tue Jun 13 01:18    gone - no logout
user     ssh:notty    31.207.47.50     Tue Jun 13 01:18 - 01:18  (00:00)
ubnt     ssh:notty    31.207.47.50     Tue Jun 13 01:18 - 01:18  (00:00)

It also shows user id, terminal, source IP, and time who tried to log into the system but failed to do so.

Scheduler logs : /var/log/cron

Cron (Linux scheduler) logs are saved under /var/log/cron. Job run status, daemon logs are saved here. It helps in understanding job execution and troubleshooting of scheduled jobs. Its plain text file and supports all normal file operations.

Jun 15 00:09:51 server1 crond[1473]: (CRON) INFO (Shutting down)
Jun 19 07:21:18 server1 crond[1474]: (CRON) STARTUP (1.4.4)
Jun 19 07:21:18 server1 crond[1474]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 26% if used.)
Jun 19 07:21:20 server1 crond[1474]: (CRON) INFO (running with inotify support)
Jun 19 07:30:01 server1 CROND[1676]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 19 07:40:01 server1 CROND[1702]: (root) CMD (/usr/lib64/sa/sa1 1 1)

Since the server was shut on 15 Jun and started on 19 Jun you can see cron daemon wrote down and up logs in this log file. For job executed, log code CMD is used. Daemon messages are tagged with INFO.

Mail logs : /var/log/maillog

Mail related logs saved here. It logs date-time, hostname, mail protocol (with PID sometimes), and messages.

Jun 15 00:09:51 server1 postfix/postfix-script[1939]: stopping the Postfix mail system
Jun 15 00:09:51 server1 postfix/master[1432]: terminating on signal 15
Jun 19 07:21:17 server1 postfix/postfix-script[1432]: starting the Postfix mail system
Jun 19 07:21:17 server1 postfix/master[1433]: daemon started -- version 2.6.6, configuration /etc/postfix

Since our server doesn’t have an SMTP configuration, there is nothing much in our file shown above.

File transfer logs : /var/log/xferlog

This log contains information from file transfer program/service daemons.

Mon Jun 19 04:44:43 2017 1 10.4.1.22 20 /CSV/test1.csv a _ o r testuser ftp 0 * c
Mon Jun 19 04:51:33 2017 1 10.4.1.22 432 /CSV/test4.csv a _ o r testuser ftp 0 * c
Mon Jun 19 04:57:15 2017 1 10.4.1.22 110 /CSV/test14.csv a _ o r testuser ftp 0 * c
Mon Jun 19 04:57:19 2017 1 10.4.1.22 2505 /CSV/master.csv a _ o r testuser ftp 0 * c

Sample log files hows date, time, source IP, file path copied, options used, the user who copied files, the protocol used details.

Last login details : /var/log/lastlog

System’s all user’s recent login details are saved to this log file. Its not plain text file and hence you need to use lastlog command which reads data from this log file and print on the terminal in a human-readable format. It sorts users with their order in /etc/passwd file.

# lastlog
Username         Port     From             Latest
root                                       **Never logged in**
bin                                        **Never logged in**
oprofile                                   **Never logged in**
tcpdump                                    **Never logged in**
ec2-user         pts/1    59.184.170.243   Mon Jun 19 07:24:14 -0400 2017
apache                                     **Never logged in**

Above output is pretty self explanatory.

Device driver messages : dmesg

Boot time device/driver writes messages on the console. Those messages can be viewed as post boot as well using dmesg command. These messages help in troubleshooting devices or driver initialization issues.

# dmesg
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-696.el6.x86_64 (mockbuild@x86-027.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Feb 21 00:53:17 EST 2017
Command line: console=ttyS0 ro root=UUID=1b7ea291-67b4-48da-802a-be4b2bcb64d5 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 xen_blkfront.sda_is_xvda=1 console=ttyS0,115200n8 console=tty0 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
 BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)

----- output trimmed -----

Those are all boot time messages just before /var/log/boot.log

System crash dump : /var/crash

This is the directory under which the core-dump is saved in case of a system crash. Obviously you need to configure dump configs on the system for it. This information is very important to get the root cause of the system crash. This is almost the image of the current state of the system at the time of the crash. Most of the vendor asks this dump files to analyze when we log a case with them for a system crash.

pvcreate error: Device /dev/xyz not found (or ignored by filtering).

Solution for pvcreate error:  Device /dev/xyz not found (or ignored by filtering). Troubleshooting steps and resolution for this error.

Solution for pvcreate error: Device /dev/xyz not found (or ignored by filtering).

Sometimes when adding new disk/LUN to Linux machine using pvcreate you may come across below error :

  Device /dev/xyz not found (or ignored by filtering).

# pvcreate /dev/sdb
  Device /dev/sdb not found (or ignored by filtering).

This is due to disk was used in different volume managers (possibly Linux own fdisk manager) and now you are trying to use it in LVM. To resolve this error, first, check if it has fdisk partitions using fdisk command :

# fdisk /dev/sdb

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/sdb: 859.0 GB, 858993459200 bytes
255 heads, 63 sectors/track, 104433 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x62346fee6

    Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      104433   838858041   83  Linux

In the above example, you can print the current partition table of the disk using p option under fdisk menu.

You can see there is one primary partition detected using fdisk. Because of this LVM command to initialize this disk (pvcreate) failed.

To resolve this you need to remove this partition and re-initialize disk in LVM.  To delete partition use d option under fdisk menu.

# fdisk /dev/sdb

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help):d
Selected partition 1

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

After issuing delete d command under fdisk menu, you need to write (w) changes on disk. This will remove your existing partition on the disk. Once again you can use print p option to make sure that there is no fdisk partition on the disk.

You can now use disk in LVM without any issue.

# pvcreate /dev/sdb
  Physical volume "/dev/sdb" successfully created

If this solution doesn’t work for you or there were no partitions on disk previously and still, if you get this error then you may want to look at your multipath configurations. The hint is to look at your verbose pvcreate output to check where it’s failing. Use pvcreate -vvv /dev/<name> command.

How to resolve the fatal error: curses.h: No such file or directory

Learn how to get rid of the fatal error: curses.h: No such file or directory during utility or third-party package installations in Linux.

Solution for curses.h: No such file or directory

Many times during package/utility installations you must have come across an error like one below :

fatal error: curses.h: No such file or directory

Recently I faced it while installing cmatrix from source code. I saw an error like one below :

# make
gcc -DHAVE_CONFIG_H -I. -I. -I.     -g -O2 -Wall -Wno-comment -c cmatrix.c
cmatrix.c:37:20: fatal error: curses.h: No such file or directory
 #include <curses.h>
                    ^
compilation terminated.
make: *** [cmatrix.o] Error 1

After troubleshooting I came up with a solution and able to pass through make stage. I am sharing it here which might be useful for you.

curses.h header file belongs to ncurses module! You need to install packages ncurses-devel, ncurses (YUM) or libncurses5-dev (APT) and you will be through this error.

Use yum install ncurses-devel ncurses for YUM based systems (like Red Hat, CentOS, etc.) or apt-get install libncurses5-dev for APT based systems (like Debian, Ubuntu, etc.) Verify once that package is installed and proceed with your next course of action.

Follow category ‘Troubleshooting errors‘ for more such error based solutions.