Category Archives: Troubleshooting Errors

networker process not starting

networker service not starting

Quick post to troubleshoot issue with networker service startup

Networker process startup issue

If you come across issue where you installed new networker agent on Linux server and service is not coming up. You will see below message –

root@kerneltalks ~# /etc/init.d/networker start
root@kerneltalks ~# /etc/init.d/networker status
There are currently no running NetWorker processes.

Troubleshooting

You can dig through logs or run a debug using below command :

root@kerneltalks ~# nsrexecd -D5

It will print lots of messages. You have go through them for possible cause of issue. I found below offending entries –

RAP critical 162 Attributes '%s' and/or '%s' of the %s resource do not resolve to the machine's hostname '%s'. To correct the error, it may be necessary to delete the %s database.

Solution

First check your /etc/hosts file is correct and having valid loopback entry.

cat /etc/hosts |grep loopback
hostname
hostname -f

After that move your /nsr directory and try to restart the service.

mv /nsr /nsr.backup
mkdir /nsr

This should resolve the issue and you should be able to see networker service is up and running

root@kerneltalks ~# /etc/init.d/networker start
root@kerneltalks ~#  /etc/init.d/networker status
+--o nsrexecd (34521)
root@kerneltalks ~#   ps -ef | grep -i nsr
root     34521  3 11:17 ?        00:00:00 /usr/sbin/nsrexecd

Troubleshooting Ansible errors

List of errors seen while working on Ansible and their solutions.

Let’s check errors you might come across in Ansible

Error

"msg": "Failed to connect to the host via ssh: ssh: connect to host 172.17.0.9 port 22: No route to host",

Cause and solution

Ansible control machine is not able to reach the client. Make sure client hostname is resolved via –

  • DNS server or
  • /etc/hosts of Ansible control server or
  • By /etc/ansible/hosts or your custom Ansible inventory file.

Also, network connectivity over port 22 from Ansible control machine to the client is working fine (test using telnet)


Error

"msg": "Failed to connect to the host via ssh: Permission denied (publickey,password).",

Cause and solution

Ansible control server is failed to authenticate the connection to the client.

Issues while working on ELK stack

A quick post on a couple of errors and their solutions while working on ELK stack.

ELK stack issues and solutions

ELK stack i.e. ElasticSearch Logstash and Kibana. We will walk you through a couple of errors you may see while working on ELK stack and their solutions.

Error: missing authentication token for REST request

First, thing first how to run cluster curl commands which are spared everywhere on the Elastic documentation portal. They have a copy as a curl command which if you run on your terminal will end up in below error –

root@kerneltalks # curl -X GET "localhost:9200/_cat/health?v&pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "missing authentication token for REST request [/_cat/health?                                                                                        v&pretty]",
        "header" : {
          "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
        }
      }
    ],
    "type" : "security_exception",
    "reason" : "missing authentication token for REST request [/_cat/health?v&pr                                                                                        etty]",
    "header" : {
      "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
    }
  },
  "status" : 401
}
Solution:

You need to use authentication within curl command and you are good to go. It’s good practice to use the only username in command with -u switch so that you won’t reveal your password in command history! Make sure you use the Kibana UI user here.

root@kerneltalks # curl -u kibanaadm -X GET "localhost:9200/_cat/health?v&pretty"
Enter host password for user 'kibanaadm':
epoch      timestamp cluster        status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1578644464 08:21:04  test-elk green           1         1   522 522    0    0        0             0                  -                100.0%

Issue: How to remove x-pack after 6.2 upgrades

If you are running ELK stack 6.2 and you are performing upgrade then you need to take care of the x-pack module first. Since x-pack is included within 6.3 and later distributions you don’t need it as a separate module. But due to some reason, while upgrade mew stack won’t be able to remove the existing x-pack module. This will lead to having 2 x-pack modules on system and Kibana restarting continuously because of that with below error –

Error: Multiple plugins found with the id \"xpack_main\":\n  - xpack_main at /usr/share/kibana/node_modules/x-pack\n  - xpack_main at /usr/share/kibana/plugins/x-pack
Solution:

So, before the upgrade, you need to remove the x-pack plugin from ElasticSearch and Kibana as well. Using below commands –

root@kerneltalks # /usr/share/elasticsearch/bin/elasticsearch-plugin remove x-pack
-> removing [x-pack]...
-> preserving plugin config files [/etc/elasticsearch/x-pack] in case of upgrade; use --purge if not needed

root@kerneltalks # /usr/share/kibana/bin/kibana-plugin remove x-pack
Removing x-pack...

This will make your upgrade go smooth. If you have already upgraded
(with RPM) and faced the issue, you may try to downgrade packages rpm -Uvh --oldpackage <package_name> and then try to remove x-pack modules.


Issue: How to set Index replicas to 0 on single node ElasticSearch cluster

On single node ElasticSearch cluster if you are running default configuration then you will run into un-assigned replicas issue. In Kibana UI you can see those Index health as Yellow. Also, your cluster health will be yellow too with a message – Elasticsearch cluster status is yellow. Allocate missing replica shards.

Solution:

You need to mark all indices with a replica count to zero. You can do this in one go using below command –

root@kerneltalks # curl -u kibanaadm -X PUT "localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "number_of_replicas" : 0
    }
}
'
Enter host password for user 'kibanaadm':
{
  "acknowledged" : true
}

Where _all can be replaced with a specific index name if you want to do it for a specific index. Use the Kibana UI user in command and you will be asked for the password. Once entered it alters all indices setting and will show you output as above.

You can now check-in Kibana UI and your cluster health along with index health will be Green.

Troubleshooting errors seen in Linux

I am consolidating errors I came across and their solution in quick words for easy reference to me and you as well!

Troubleshooting Linux errors!

Error saw while starting the MariaDB server process on RHEL 6

# service mysql start
mysql: unrecognized service

Solution: You do not have MariaDB installed on your server. Install MariaDB


Error while starting MariaDB server process n RHEL 7

# systemctl start mariadb
Failed to issue method call: Unit mariadb.service failed to load: No such file or directory.

Solution: You do not have MariaDB installed. Install mariadb-server package


Error while installing Symantec Antivirus

which: no uudecode in (/usr/sbin:/usr/bin:/bin)
ERROR: Required utility missing: uudecode. Please install this
utility before using this Intelligent Updater package.

Solution : uudecode is provided by sharutils package. Install sharutils package.


Error while exporting a filesystem

# exportfs -ra
exportfs: 34.89.123.45:/data: Function not implemented

Solution: Check and start the nfs-server process.


Error while listing directory files

# ls -lrt
ls: cannot open directory '.': Permission denied

Solution: Your directory does not have read permission to the owner. Sometimes due to windows to Linux file copy etc. Set permission and you are good to go. Use the command in the same directory # chmod -R +r .


Error while querying NTP

# ntpq -p
localhost: timed out, nothing received
***Request timed out

Solution : Edit /etc/ntp.conf and replace restrict 127.0.0.1 to restrict localhost then restart ntpd service with systemctl restart ntpd


Error during mounting of the file system

# mount /dev/vg01/lvol0 /dump
mount: unknown filesystem type '(null)'

Solution: You are trying to mount a file system which is not formatted yet. Format filesystem and then try mounting.


Error while mounting other system’s disk

I was trying to mount a disk from another server in AWS and it was not mounting. I checked dmesg and got below error :

[  792.138218] XFS (xvdh2): Filesystem has duplicate UUID d295b18a-2a70-4260-9f59-60e51432ea92 - can't mount

Solution: Since I was doing some research I temporarily mounted it without UUID. using below command –

root@kerneltalks # mount -t xfs -o nouuid /dev/xvdh2 /disk1

But ideally, you should have unique UUID to all disks on the system and you can generate UUID in such a case using XFS utility.


keytool command not found

keytool is used to generate key or CSR for SSL certificate.

# keytool -genkey -alias server -keyalg RSA -keystore kerneltalks.jks                                                             -keysize 2048
If 'keytool' is not a typo you can use command-not-found to lookup the package that contains it, like this:
    cnf keytool

Solution: Make sure you have JRE installed (Java Runtime Environment). Goto JRE binary directory and then run this command.


java version typo

# /usr/bin/java version
Error: Could not find or load main class version

Its java trying to load the program named version. You missed hyphen there!

Solution: Try below command

# java -version
java version "1.7.0_211"
OpenJDK Runtime Environment (rhel-2.6.17.1.0.1.el7_6-x86_64 u211-b02)
OpenJDK 64-Bit Server VM (build 24.211-b02, mixed mode)

Bad magic number in super-block

Error below seen while trying to resize filesystem in RHEL7

# resize2fs /dev/mapper/vg01-data
 resize2fs 1.42.9 (28-Dec-2013)
 resize2fs: Bad magic number in super-block while trying to open /dev/mapper/vg01-data
 Couldn't find valid filesystem superblock.

Solution: This is because RHEL7 has the XFS filesystem by default so you need to use xfs_growfs command to resize the filesystem.

# xfs_growfs  /dev/vg01/data
meta-data=/dev/mapper/vg01-root  isize=256    agcount=4, agsize=851968 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=3407872, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 3407872 to 7863296

How to change DocumentRoot in Apache2 to different directory than /srv/www/htdocs

Apache2 has by default DocumentRoot set to /srv/www/html. If you want to change it to some different directory you need to change it in a couple of configuration files.

Easy way to search all those files is searched in the directory –

# grep -R "DocumentRoot" /etc/apache2
# grep -R srv /etc/apche2

Here are few files and the lines within them you need to edit.

# vi /etc/apache2/default-server.conf 
ScriptAlias /cgi-bin/ "/srv/www/cgi-bin/"
<Directory "/srv/www/cgi-bin">
DocumentRoot "/srv/www/htdocs"
<Directory "/srv/www/htdocs">

# vi /etc/apache2/vhosts.d/vhost-ssl.conf 
DocumentRoot "/srv/www/htdocs"

You need to edit /srv/www/htdocs to directory of your choice. Also, you need to change relative directories to /srv as well. Once you are done with editing, you need to restart the apache2 service and you are good to go.


server_id_usr_crc warning in Suse Manger

Repeatedly below warning is being logged in /var/log/messages in Suse Manager server 4.0

2019-08-07T20:38:02.832696+08:00 susemgr-test salt-master[12485]: [WARNING ] /usr/lib/python3.6/site-packages/salt/grains/core.py:2815: DeprecationWarning: This server_id is computed nor by Adler32 neither by CRC32. Please use "server_id_use_crc" option and define algorithm youprefer (default "Adler32"). The server_id will be computed withAdler32 by default.

Solution : Add server_id_use_crc: adler32 entry at end of the file /etc/salt/master.d/susemanager.conf and then restart the Suse Manager process.


smdba backup fails to run in cron on SUSE Manager

smdba is a DB backup tool by SUSE to be used on Suse Manager which runs on postgres database. smdba tool to be run by root and in the backend it switches to DB user to connect with database and execute database stuff. It runs manually well but when scheduled in cron it exits with the below error.

Backend error:
        Access denied to UID 'postgres' via sudo.

You can see this error in root mail or you need to redirect stderr of cron command to file and you can see it in there.

Solution: This is because the root is not able to sudo to postgres user since cron spawned process don’t have tty attached to it and your sudo most likely have Defaults requiretty active in /etc/sudoers. If you want you can disable it system-wide by putting # in front of it or add a dedicated entry for root Defaults:root !requiretty to move out of this restriction. Once done try running smdba commands via cron and they will run successfully.


/etc/resolv.conf resetting to default after reboot

Issue: My /etc/resolv.conf entries gets wiped out after reboot. Manual entries added in /etc/resolv.conf are getting deleted after reboot.

Solution: This is probably because your /etc/resolv.conf is being auto-generated by netconfig. It will be symlink to /var/run/netconfig/resolv.conf. You can disable this by setting NETCONFIG_DNS_POLICY='' in /etc/sysconfig/network/config file. It will be defined as auto, you set it to blank. Or you can edit below parameters in the same file if you want to keep the above policy parameter untouched.

NETCONFIG_DNS_STATIC_SEARCHLIST
NETCONFIG_DNS_STATIC_SERVERS
NETCONFIG_DNS_FORWARDER

Once done adjust /etc/resolv.conf by running command netconfig update -f. If after this your /etc/resolv.conf remains as it is then you are good otherwise you need to review the above settings again carefully.

If it is being reloaded by DHCP you will see below line in /etc/resolv.conf

; generated by /usr/sbin/dhclient-script

In that case you need to perform below actions.

# vi /etc/dhcp/dhclient-enter-hooks
#!/bin/sh
make_resolv_conf(){
    :
}
#chmod +x /etc/dhcp/dhclient-enter-hooks

yum command giving metadata errors

yum command showing below error :

Error while executing packages action: failed to retrieve repodata/filelists.xml.gz from Oraclelinux7-x86_64 error was [Errno -1] Metadata file does not match checksum
Solution :

Run below commands and you are good to go.

# yum clean all
# yum makecache

PAM module error

PAM unable to dlopen(https://z5.kerneltalks.com/lib64/security/pam_gnome_keyring.so): /lib64/security/pam_gnome_keyring.so: cannot open shared object file: No such file or directory
PAM adding faulty module: /lib64/security/pam_gnome_keyring.so
Solution :

Update pam packages and/or install gnome-keyring package.

Account login error with LDAP

pam_sss(sudo:auth): received for user shrikant: 10 (User not known to the underlying authentication module)
Solution :

This is because account shrikant does not exists in LDAP server account list. If this is local user to that perticular client then you can add it to ignore list in LDAP config file /etc/sssd/sssdconf in below parameters.

[nss]
filter_users = root,shrikant
filter_groups = root,dba

NFS Timeout error

# mount -v -t nfs 10.10.1.2:/data /mnt/data
mount.nfs: timeout set for Wed Jan 29 08:29:01 2020
mount.nfs: trying text-based options 'vers=4,addr=10.10.1.2,clientaddr=10.10.1.3'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out

Solution :

This is because client is not able to reach NFS server. There are couple of things you should check.

  • TCP and UDP port 2049 and 111 should be open between client and server. Use nc -v -u <nfs_server> port
  • NFS server service should be running on the server
  • NFS client service should be running on the client
  • If you have SEP 14 (Symantec Endpoint Protection) antivirus running on your machines then un-install and reboot both client and server.

Warning in xclock command

# xclock
Warning: Missing charsets in String to FontSet conversion

Solution:

This is just a warning about improper environment variables. You can avoid it by exporting –

export LC_ALL=C

You can add this in the user profile file as well so that it will be exported at login and no need to exporting manually.

sssd service is not starting up

After patching or system migration like activities your sssd dont start up. When you try to start sssd service you get below errors in systemd status sssd :

sssd[16866]: Exiting the SSSD. Could not restart critical service [kerneltalks.com].
systemd[1]: sssd.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start System Security Services Daemon.
systemd[1]: sssd.service: Unit entered failed state.
systemd[1]: sssd.service: Failed with result 'exit-code'.

In such cases the best way to check actual errors is to check the log file located in /var/log/sssd/sssd*.log. You can see sssd logs as well as domain logs here. You need to check both.

In my case I got errors in domain log file –

[sssd[be[kerneltalks.com]]] [dp_target_init] (0x0010): Unable to load module krb5
[sssd[be[kerneltalks.com]]] [be_process_init] (0x0010): Unable to setup data provider [1432158209]: Internal Error
[sssd[be[kerneltalks.com]]] [main] (0x0010): Could not initialize backend [1432158209]
[sssd[be[kerneltalks.com]]] [dp_module_open_lib] (0x0010): Unable to load module [krb5] with path [/usr/lib64/sssd/libsss_krb5.so]: /usr/lib64/sssd/libsss_krb5.so: cannot open shared object file: No such file or directory

For this missing file, I installed sssd-krb5 package and my issue got resolved.

sssd service is running but user can not login

sssd service was running fine but showing below error in systemctl status sssd and the user was not able to log in –

 sssd_be[2338]: GSSAPI Error: An invalid name was supplied (Success)
Solution :

Add below line under section [libdefaults] in /etc/krb5.conf

rdns = false

then restart sssd service using systemctl restart sssd

MobaXterm X11 proxy: Authorisation not recognised

Learn how to resolve Authorisation not recognized error while using xterm in Linux

xclock error

Error :

Sometimes your users complain they can’t use GUI via X server from Linux box (in this case mobaXterm). They are receiving their display authorization is not recognized. An error like below –

appuser@kerneltalks@ xclock
MobaXterm X11 proxy: Authorisation not recognised
Error: Can't open display: localhost:10.0

Sometimes these errors show up when you switch user from the root account or any other account.

Quick Solution:

Login directly with user on which you want to use xclock

appuser needs to log in directly on the server and you won’t see this issue. Most of the time it arises once you su to appuser from root or different users.

Read further if you have to switch user and then use x-term.

appuser need to add its entry to authorization. This entry will be the last entry in .Xauthority file in a home directory of the previous user with which you have logged in the server in the first place. Let’s say its root in our case. i.e. we logged in as root and then su to appuser

root@kerneltalks # xauth -f .Xauthority list |tail -1
kerneltalks/unix:10 MIT-MAGIC-COOKIE-1 df22dfc7df88b60f0653198cc85f543c

appuser@kerneltalks $ xauth add kerneltalks/unix:10 MIT-MAGIC-COOKIE-1 df22dfc7df88b60f0653198cc85f543c

So here we got values from root home directory file and then we added it in using xauth in currently su user i.e. appuser

and you are good to go!

Bit of an explanation :

This error occurs since your ID doesn’t have the authorization to connect to the X server.  Let’s walk through how to resolve this error. List out authorization entries for displays using xauth list

appuser@kerneltalks $ xauth list
kerneltalks/unix:12  MIT-MAGIC-COOKIE-1  60c402df81f68e721qwe531d1c99c1eb
kerneltalks/unix:11  MIT-MAGIC-COOKIE-1  ad81da801d778fqwe6aea383635be27d
kerneltalks/unix:10  MIT-MAGIC-COOKIE-1  0bd591485031d0ae670475g46db1b8b9

The output shows entries column wise –

  1. Display name
  2. Protocol name (MIT-MAGIC-COOKIE-1 referred to single period)
  3. hexkey

If you have many sessions and you are on test/dev environment and you are the only one using your system you can remove all the above entries using xauth remove to make sure you have a clean slate and getting only your session cookie. Or, you can save this output for reference. Log in again, try  xclock and new the entry will be generated. Compare the latest output with the older one and get your new entry filtered out. Or as mentioned above in a quick solution it will be last entry in .Xauthority file in a home directory of appuser. You can not read  .Xauthority file like text file so you have to use xauth -f command to view its content.

Logout from all sessions. Login again with the app user and run xclock once. This will generate a new session cookie token which you can see in xauth list .

appuser@kerneltalks $ xauth list
kerneltalks/unix:10  MIT-MAGIC-COOKIE-1  df22dfc7df88b60f0653198cc85f543c

Now, grab this entry and add authorization using below command –

appuser@kerneltalks $ xauth add APCSFIOGWDV02/unix:10  MIT-MAGIC-COOKIE-1  df22dfc7df88b60f0653198cc85f543c

and that’s it. You xclock should work now!


Error :

You are seeing below error in mobaXterm

X11-forwarding  : ✘  (disabled or not supported by server)

Solution :

The best way to make sure you have all X11 stuff installed is to run the install package xclock. Additionally, you need to install xauth package as well.

Secondly, make sure you have X11Forwarding yes set in your /etc/ssh/sshd_config. If not then set and restart sshd daemon.

That’s all! Try re-logging to the server and it should work. You should see the below message after login using MobXterm.

 X11-forwarding  : &#x2714;  (remote display is forwarded through SSH)

How to resolve setenv: command not found

setenv is a built-in command for csh. You need to have C Shell to tackle with setenv: command not found error.

setenv: command not found resolution

Error :

Set environment command setenv is not available on the system. You see below error :

root@kerneltalks # setenv
-bash: setenv: command not found

So question is how to install setenv command.

Solution :

setenv is a shell built-in command comes with C shell csh. Above error could be due to two things –

  1. csh is not installed on server
  2. User havnt invoked csh shell

For point 1, go ahead and install csh package.

For point 2, Simply invoke csh shell by changing user login shell (usermod -s)  or use chsh command as below –

root@kerneltalks # chsh root
Changing shell for root.
New shell [/bin/bash]: /bin/csh
Shell changed.

And to change shell on the fly for your current logged-in session use below command –

root@kerneltalks # echo $0
bash
root@kerneltalks # csh
root@kerneltalks # echo $0
csh

Now, after csh shell availability if you run setenv, it runs smooth!

# setenv
REMOTEHOST=210.23.23.456
XDG_SESSION_ID=1
HOSTNAME=kerneltalks
HOST=kerneltalks
TERM=xterm
SHELL=/bin/bash
HISTSIZE=1000
GROUP=root
USER=root
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HOSTTYPE=x86_64-linux
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
MAIL=/var/spool/mail/root
PWD=/root
LANG=en_US.UTF-8
HISTCONTROL=ignoredups
HOME=/root
SHLVL=6
OSTYPE=linux
VENDOR=unknown
MACHTYPE=x86_64
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
_=/bin/csh

VMware tools not running after Linux kernel upgrade

Solution for VMware tools not running after Linux kernel upgrade in guest VM

VMware tools not running after Linux kernel upgrade

In this article, we will discuss solutions when VMware tools are not running after the Linux kernel upgrade.

Cause :

After kernel upgrade in the Guest VM Linux machine, you may see VMware tools are not running. This is because there are VMware tools modules that runs using kernel library files. After a kernel upgrade, they point to different library files than the one currently used by the kernel and hence failed to start.

Solution :

The issue can be resolved by reconfiguring VMware tools after the kernel upgrade. This process is on the fly and does not require downtime.

Login to Guest Linux operating system using root account and run reconfiguration script /usr/bin/vmware-config-tools.pl

You will be asked a few choices to make. If you know about those modules you choose your answers according to your requirement and just hit enter to accept defaults.  See below sample output –

root@kerneltalks # /usr/bin/vmware-config-tools.pl
Initializing...
Making sure services for VMware Tools are stopped.
Found a compatible pre-built module for vmci.  Installing it...
Found a compatible pre-built module for vsock.  Installing it...
The module vmxnet3 has already been installed on this system by another
installer or package and will not be modified by this installer.

The module pvscsi has already been installed on this system by another
installer or package and will not be modified by this installer.

The module vmmemctl has already been installed on this system by another
installer or package and will not be modified by this installer.

The VMware Host-Guest Filesystem allows for shared folders between the host OS
and the guest OS in a Fusion or Workstation virtual environment.  Do you wish
to enable this feature? [no]

Found a compatible pre-built module for vmxnet.  Installing it...


The vmblock enables dragging or copying files between host and guest in a
Fusion or Workstation virtual environment.  Do you wish to enable this feature?
[no]

VMware automatic kernel modules enables automatic building and installation of
VMware kernel modules at boot that are not already present. This feature can
be enabled/disabled by re-running vmware-config-tools.pl.

Would you like to enable VMware automatic kernel modules?
[no]

Do you want to enable Guest Authentication (vgauth)? Enabling vgauth is needed
if you want to enable Common Agent (caf). [yes]

Do you want to enable Common Agent (caf)? [yes]

No X install found.

Creating a new initrd boot image for the kernel.

NOTE: both /etc/vmware-tools/GuestProxyData/server/key.pem and
      /etc/vmware-tools/GuestProxyData/server/cert.pem already exist.
      They are not generated again. To regenerate them by force,
      use the "vmware-guestproxycerttool -g -f" command.

vmware-tools start/running
The configuration of VMware Tools 10.0.6 build-3560309 for Linux for this
running kernel completed successfully.

You must restart your X session before any mouse or graphics changes take
effect.

You can now run VMware Tools by invoking "/usr/bin/vmware-toolbox-cmd" from the
command line.

To enable advanced X features (e.g., guest resolution fit, drag and drop, and
file and text copy/paste), you will need to do one (or more) of the following:
1. Manually start /usr/bin/vmware-user
2. Log out and log back into your desktop session; and,
3. Restart your X session.

Enjoy,

--the VMware team

If you are ok to accept the default and want the script to run non-interactive, run it with -d default switch.

root@kerneltalks # /usr/bin/vmware-config-tools.pl -d default

Once, the script finishes execution you can see in VMware console that it shows VMware tools are running on guest VM!

Failed to mount cd error in Zypper

Troubleshooting to get rid of failed to mount CD error due to CD repo in zypper.

Failed to mount cd error in Zypper

Error :

While trying to install the package in zypper I came across below error :

Failed to mount cd:///?devices=/dev/disk/by-id/ata-VMware_Virtual_IDE_CDROM_Drive_10000000000000000001 on /var/adm/mount/AP_0xFre2nn: Mounting media failed (mount: no medium found on /dev/sr0)

Detailed error snippet below :

# zypper in salt-minion
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 16 NEW packages are going to be installed:
  libzmq3 python-Jinja2 python-MarkupSafe python-PyYAML python-backports.ssl_match_hostname python-futures python-msgpack-python python-netaddr python-psutil
  python-pycrypto python-pyzmq python-requests python-simplejson python-tornado salt salt-minion

The following 2 recommended packages were automatically selected:
  python-futures python-netaddr

The following 15 packages are not supported by their vendor:
  libzmq3 python-Jinja2 python-MarkupSafe python-PyYAML python-backports.ssl_match_hostname python-futures python-msgpack-python python-psutil python-pycrypto
  python-pyzmq python-requests python-simplejson python-tornado salt salt-minion

16 new packages to install.
Overall download size: 9.0 MiB. Already cached: 0 B. After the operation, additional 48.0 MiB will be used.
Continue? [y/n/? shows all options] (y): y
Retrieving package python-netaddr-0.7.10-8.5.noarch                                                                       (1/16), 896.9 KiB (  4.2 MiB unpacked)
Failed to mount cd:///?devices=/dev/disk/by-id/ata-VMware_Virtual_IDE_CDROM_Drive_10000000000000000001 on /var/adm/mount/AP_0xFre2nn: Mounting media failed (mount: no medium found on /dev/sr0)

Please insert medium [SLES12-SP1-12.1-0] #1 and type 'y' to continue or 'n' to cancel the operation. [yes/no] (no): n
Problem occured during or after installation or removal of packages:
Installation aborted by user

Please see the above error message for a hint.

Cause :

This error is nothing but zypper trying to read repo information from CD/DVD. Since one of the zypper repo is configured to look for mountable media, it’s doing its job. But, currently, that media is not connected to the system, and hence zypper is failing to read details from it.

Solution :

List your zypper repo using the command :

# zypper lr --details
# | Alias                | Name                 | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                                    | Service
--+----------------------+----------------------+---------+-----------+---------+----------+--------+----------------------------------------------------------------------------------------+--------
1 | SLES12-SP1-12.1-0    | SLES12-SP1-12.1-0    | Yes     | (r ) Yes  | No      |   99     | yast2  | cd:///?devices=/dev/disk/by-id/ata-VMware_Virtual_IDE_CDROM_Drive_10000000000000000001 |
2 | sles12-sp1-bootstrap | sles12-sp1-bootstrap | Yes     | ( p) Yes  | No      |   99     | rpm-md | http://repo.kerneltalks.com/pub/repositories/sle/12/1/bootstrap                 |

Here you can see first repo’s URI is pointing to a CD. Now you can mount the CD or you can disable that repo for time being and move ahead with the installation.

Use the below command to disable CD repo. Make sure you enter correct repo number in command (here it’s 1)

# zypper mr --disable 1
Repository 'SLES12-SP1-12.1-0' has been successfully disabled.

Once CD/DVD repo is disabled successfully, re-run zypper installation command and you will be able to execute it without any errors!

# zypper in salt-minion
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 15 NEW packages are going to be installed:
  libzmq3 python-Jinja2 python-MarkupSafe python-PyYAML python-backports.ssl_match_hostname python-futures python-msgpack-python python-psutil python-pycrypto
  python-pyzmq python-requests python-simplejson python-tornado salt salt-minion

The following recommended package was automatically selected:
  python-futures

The following 15 packages are not supported by their vendor:
  libzmq3 python-Jinja2 python-MarkupSafe python-PyYAML python-backports.ssl_match_hostname python-futures python-msgpack-python python-psutil python-pycrypto
  python-pyzmq python-requests python-simplejson python-tornado salt salt-minion

15 new packages to install.
Overall download size: 8.1 MiB. Already cached: 0 B. After the operation, additional 43.8 MiB will be used.
Continue? [y/n/? shows all options] (y): y
Retrieving package python-backports.ssl_match_hostname-3.4.0.2-17.1.noarch                                                (1/15),  10.4 KiB ( 14.3 KiB unpacked)
Retrieving: python-backports.ssl_match_hostname-3.4.0.2-17.1.noarch.rpm ..................................................................................[done]
Retrieving package python-futures-3.0.2-7.1.noarch                                                                        (2/15),  23.5 KiB ( 85.6 KiB unpacked)
Retrieving: python-futures-3.0.2-7.1.noarch.rpm ..........................................................................................................[done]
Retrieving package python-requests-2.11.1-6.20.1.noarch                                                                   (3/15), 396.8 KiB (  1.9 MiB unpacked)
Retrieving: python-requests-2.11.1-6.20.1.noarch.rpm .....................................................................................................[done]
Retrieving package libzmq3-4.0.4-6.1.x86_64                                                                               (4/15), 278.6 KiB (676.6 KiB unpacked)
Retrieving: libzmq3-4.0.4-6.1.x86_64.rpm .................................................................................................................[done]
Retrieving package python-MarkupSafe-0.18-7.1.x86_64                                                                      (5/15),  24.6 KiB ( 66.0 KiB unpacked)
Retrieving: python-MarkupSafe-0.18-7.1.x86_64.rpm ........................................................................................................[done]
Retrieving package python-PyYAML-3.12-25.1.x86_64                                                                         (6/15), 154.6 KiB (625.5 KiB unpacked)
Retrieving: python-PyYAML-3.12-25.1.x86_64.rpm ...........................................................................................................[done]
Retrieving package python-msgpack-python-0.4.6-2.1.x86_64                                                                 (7/15),  67.0 KiB (221.0 KiB unpacked)
Retrieving: python-msgpack-python-0.4.6-2.1.x86_64.rpm ...................................................................................................[done]
Retrieving package python-psutil-1.2.1-9.1.x86_64                                                                         (8/15), 100.3 KiB (444.6 KiB unpacked)
Retrieving: python-psutil-1.2.1-9.1.x86_64.rpm ...........................................................................................................[done]
Retrieving package python-pycrypto-2.6.1-4.1.x86_64                                                                       (9/15), 371.5 KiB (  2.0 MiB unpacked)
Retrieving: python-pycrypto-2.6.1-4.1.x86_64.rpm .........................................................................................................[done]
Retrieving package python-simplejson-3.8.2-4.1.x86_64                                                                    (10/15), 105.0 KiB (384.5 KiB unpacked)
Retrieving: python-simplejson-3.8.2-4.1.x86_64.rpm .......................................................................................................[done]
Retrieving package python-pyzmq-14.0.0-3.1.x86_64                                                                        (11/15), 510.3 KiB (  1.5 MiB unpacked)
Retrieving: python-pyzmq-14.0.0-3.1.x86_64.rpm ...........................................................................................................[done]
Retrieving package python-Jinja2-2.7.3-17.1.noarch                                                                       (12/15), 278.5 KiB (  1.7 MiB unpacked)
Retrieving: python-Jinja2-2.7.3-17.1.noarch.rpm ..........................................................................................................[done]
Retrieving package python-tornado-4.2.1-9.1.x86_64                                                                       (13/15), 547.1 KiB (  2.8 MiB unpacked)
Retrieving: python-tornado-4.2.1-9.1.x86_64.rpm ..........................................................................................................[done]
Retrieving package salt-2016.11.4-45.2.x86_64                                                                            (14/15),   5.2 MiB ( 31.4 MiB unpacked)
Retrieving: salt-2016.11.4-45.2.x86_64.rpm ...............................................................................................................[done]
Retrieving package salt-minion-2016.11.4-45.2.x86_64                                                                     (15/15), 107.8 KiB ( 36.9 KiB unpacked)
Retrieving: salt-minion-2016.11.4-45.2.x86_64.rpm ........................................................................................................[done]
Checking for file conflicts: .............................................................................................................................[done]
( 1/15) Installing: python-backports.ssl_match_hostname-3.4.0.2-17.1 .....................................................................................[done]
( 2/15) Installing: python-futures-3.0.2-7.1 .............................................................................................................[done]
( 3/15) Installing: python-requests-2.11.1-6.20.1 ........................................................................................................[done]
( 4/15) Installing: libzmq3-4.0.4-6.1 ....................................................................................................................[done]
( 5/15) Installing: python-MarkupSafe-0.18-7.1 ...........................................................................................................[done]
( 6/15) Installing: python-PyYAML-3.12-25.1 ..............................................................................................................[done]
( 7/15) Installing: python-msgpack-python-0.4.6-2.1 ......................................................................................................[done]
( 8/15) Installing: python-psutil-1.2.1-9.1 ..............................................................................................................[done]
( 9/15) Installing: python-pycrypto-2.6.1-4.1 ............................................................................................................[done]
(10/15) Installing: python-simplejson-3.8.2-4.1 ..........................................................................................................[done]
(11/15) Installing: python-pyzmq-14.0.0-3.1 ..............................................................................................................[done]
(12/15) Installing: python-Jinja2-2.7.3-17.1 .............................................................................................................[done]
(13/15) Installing: python-tornado-4.2.1-9.1 .............................................................................................................[done]
(14/15) Installing: salt-2016.11.4-45.2 ..................................................................................................................[done]
(15/15) Installing: salt-minion-2016.11.4-45.2 ...........................................................................................................[done]

You can re-enable CD/DVD repo when you have the related device mounted on the server.

/bin/bash^M: bad interpreter: No such file or directory

 The article explaining How to resolve /bin/bash^M: bad interpreter: No such file or directory in Unix or Linux server.

How to resolve /bin/bash^M: bad interpreter: No such file or directory

Issue :

Sometimes we see below error while running scripts :

root@kerneltalks # ./test_script.sh
-bash: ./test_script.sh: /bin/bash^M: bad interpreter: No such file or directory

This is the issue with files that were created or updated in Windows and later copied over to Unix or Linux machine to execute. Since Windows (DOS) and Linux/Unix interpret line feeds and carriage returns differently. Window’s carriage returns interpreted as an illegal character ^M in *nix systems.  Hence you can see ^M in the above error which is at the end of a very first line of script #!/bin/bash which invokes bash shell in the script.

To resolve this issue you need to convert the DOS file into Linux one. You can either re-write the whole file using text editors in Linux/Unix system or you can use tools like dos2unix or native commands like sed.

Solution:

Use dos2unix utility which comes pre-installed on almost all distributions nowadays. dos2unix project hosted here.

There are different encoding you can choose to convert your file. -ascii is default conversion mode & it only converts line breaks. I used here -iso which worked fine for me.

The syntax is pretty simple you need to give encoding format along with the source and destination filenames.

root@kerneltalks # dos2unix -iso -n test_script.sh script_new.sh
dos2unix: active code page: 0
dos2unix: using code page 437.
dos2unix: converting file backup.sh to file script_new.sh in Unix format ...

This way you can keep old files intact and don’t mess with the original file. If you are ok to directly edit the old file then you can try below command :

root@kerneltalks # dos2unix -k -o test_script.sh
dos2unix: converting file test_script.sh to Unix format ...

Where -k keeps the timestamp of the file intact and -o converts the file and overwrites changes to the same file.

Or

You can use streamline editor sed to globally search an replace

root@kerneltalks # sed -i -e 's/\r$//' test_script.sh

where, -i uses source file, edit, and overwrites to the same file. -e supplied the following script code to be run on the source file.

That’s it. You repaired your file from Windows to run fine on the Linux system! Go ahead… execute…!

Space is not released after deleting files in Linux?

Troubleshooting guide to reclaim space on disk after deleting files in Linux.

Space is not released after deleting files in Linux? Read this troubleshooting guide

One of the common issues Linux Unix system users face is disk space is not being released even after files are deleted. Sysadmins face some issues when they try to recover disk space by deleting high sized files in a mount point and then they found disk utilization stays the same even after deleting huge files. Sometimes, application users are moving/deleting large log files and still won’t be able to reclaim space on the mount point.

In this troubleshooting guide, I will walk you through steps that will help you to reclaim space on disk after deleting files. Here we will learn how to remove deleted open files in Linux. Most of the time files are deleted manually but processes using those files keep them open and hence space is not reclaimed. df also shows incorrect space utilization.

Process stop/start/restart

To resolve this issue, you need to gracefully or forcefully end processes using those deleted files. First, get a list of such deleted files that are still marked open by processes. Use lsof (list open files) command with +L1 switch for this or you can directly grep for deleted in lsof output without switch

root@kerneltalks # lsof +L1
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
tuned   777 root    7u   REG  202,2     4096     0 8827610 /tmp/ffiJEo5nz (deleted)

root@kerneltalks # lsof | grep -i deleted
tuned   777 root    7u   REG  202,2     4096     0 8827610 /tmp/ffiJEo5nz (deleted)

lsof output can be read column wise as below –

  1. command
  2. PID
  3. user
  4. FD
  5. type
  6. device
  7. size
  8. node
  9. name

Now, in above output check the PID 777and stop that process. If you can not stop it you can kill the process. In the case of application processes, you can refer application guides on how to stop, start, restart its processes. The restarting process helps in releasing the lock on that file which process made to hold it as open. Once the related process is stopped/restarted you can see space will be released and you can observe reduced utilization in df command output.

Clear from proc filesystem

Another way is to vacate the space used by file by de-allocating that space from /proc filesystem. As you are aware, every process in Linux has its allocations in /proc filesystem i.e. process filesystem. Make sure that the process/application has no impact if you are flushing files (which are held open by an app) from /proc filesystem.

You can find file allocation at /proc/<pid>/fd/<fd_number> location. Where PID and fd_number you can get from lsof output we saw above. If you check the type of this file then it’s a symbolic link to your deleted file.

root@kerneltalks # file /proc/777/fd/7
/proc/777/fd/7: broken symbolic link to `/tmp/ffiJEo5nz (deleted)

So, in our case we can do it using –

root@kerneltalks # > /proc/777/fd/7

That’s it! Flushing it will regain your lost space by those files which you already deleted.