Yearly Archives: 2017

How to resolve mount.nfs: Stale file handle error

Learn how to resolve mount.nfs: Stale file handle error on the Linux platform. This is a Network File System error that can be resolved from the client or server end.

When you are using the Network File System in your environment, you must have seen mount.nfs: Stale file handle error at times. This error denotes that the NFS share is unable to mount since something has changed since the last good known configuration.

Whenever you reboot the NFS server or some of the NFS processes are not running on the client or server or share is not properly exported at the server; these can be reasons for this error. Moreover, it’s irritating when this error comes to a previously mounted NFS share. Because this means the configuration part is correct since it was previously mounted. In such case once can try the following commands:

Make sure NFS service are running good on client and server.

#  service nfs status
rpc.svcgssd is stopped
rpc.mountd (pid 11993) is running...
nfsd (pid 12009 12008 12007 12006 12005 12004 12003 12002) is running...
rpc.rquotad (pid 11988) is running...

If NFS share currently mounted on the client, then un-mount it forcefully and try to remount it on NFS client. Check if its properly mounted by df command and changing directory inside it.

# umount -f /mydata_nfs
# mount -t nfs server:/nfs_share /mydata_nfs
#df -k
------ output clipped -----
server:/nfs_share 41943040  892928  41050112   3% /mydata_nfs

In above mount command, server can be IP or hostname of NFS server.

If you are getting error while forcefully un-mounting like below :

# umount -f /mydata_nfs
umount2: Device or resource busy
umount: /mydata_nfs: device is busy
umount2: Device or resource busy
umount: /mydata_nfs: device is busy

Then you can check which all processes or users are using that mount point with lsof command like below:

# lsof |grep mydata_nfs
lsof: WARNING: can't stat() nfs file system /mydata_nfs
      Output information may be incomplete.
su         3327      root  cwd   unknown                                                   /mydata_nfs/dir (stat: Stale NFS file handle)
bash       3484      grid  cwd   unknown                                                   /mydata_nfs/MYDB (stat: Stale NFS file handle)
bash      20092  oracle11  cwd   unknown                                                   /mydata_nfs/MPRP (stat: Stale NFS file handle)
bash      25040  oracle11  cwd   unknown                                                   /mydata_nfs/MUYR (stat: Stale NFS file handle)

If you see in above example that 4 PID are using some files on said mount point. Try killing them off to free mount point. Once done you will be able to un-mount it properly.

Sometimes it still gives the same error for mount command. Then try mounting after restarting NFS service at the client using the below command.

#  service nfs restart
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down RPC idmapd:                                  [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

Also read : How to restart NFS step by step in HPUX

Even if this didn’t solve your issue, final step is to restart services at the NFS server. Caution! This will disconnect all NFS shares which are exported from the NFS server. All clients will see the mount point disconnect. This step is where 99% of you will get your issue resolved. If not then NFS configurations must be checked, provided you have changed configuration and post that you started seeing this error.

Outputs in above post are from RHEL6.3 server. Drop us your comments related to this post.

How to do safe and graceful Measureware service restart in HPUX

A how-to guide for safe and graceful measureware service restart on HPUX machines. Learn how to preserve old log files during service restart and avoid overwriting them.

Measureware service is a native utility to HPUX for performance measurement. It is responsible to collect system utilization data in the background. Measureware agent mwa runs in background and stores data in logfiles called datafiles.  If you attempt measureware service restart without moving logfiles then it will overwrite current files and all historic data is on the toss. Hence you need to stop it then move data files to another location and then start it. In this sequence, you prompt agents to create new blank data files to save data.

You can view the current status of all measureware services using below command :

# mwa status all
 Perf Agent status:
    Running scopeux               (Perf Agent data collector) pid 2814
    Running midaemon              (Measurement Interface daemon) pid 2842
    Running ttd                   (ARM registration daemon) pid 2703

 Perf Agent Server status:

    Running ovcd                  (OV control component) pid 3483
    Running ovbbccb               (BBC5 communication broker) pid 3484
    Running coda                  (perf component) pid(s) 3485
       Configured DataSources(1)
                  SCOPE

    Running perfalarm             (alarm generator) pid(s) 2845

If any of the components are not running or having issues then it may call for measureware service restart. Let’s see the process of the graceful shutdown and the start of measureware services in HPUX.

Read also another performance measurement tool System Activity Report (SAR) in the below series :

1. Stop mwa

Stop all measureware services with single command as below :

# mwa stop all

Shutting down Perf Agent collection software
         Shutting down scopeux, pid(s) 2814
         The Perf Agent collector, scopeux has been shut down successfully.
NOTE:   The ARM registration daemon ttd will be left running.

Shutting down the alarm generator perfalarm, pid(s) 2845
         The perfalarm process has terminated

OVOA is running. Not shutting down coda

As you can see in the above output ttd is left running by command. You need to kill it using below command :

# ttd -k

Also, mideamon still runs after the above command. You can terminate it using :

#  midaemon -T

These three commands collectively shut off everything related to measureware services. You can confirm if midaemon, ttd and scopeux are down with status command again :

#  mwa status all
 Perf Agent status:
WARNING: scopeux    is not active (Perf Agent data collector)
WARNING: midaemon   is not active (Measurement Interface daemon)
WARNING: ttd        is not active (ARM registration daemon)

 Perf Agent Server status:

    Running ovcd                  (OV control component) pid 3483
    Running ovbbccb               (BBC5 communication broker) pid 3484
    Running coda                  (perf component) pid(s) 3485
       Configured DataSources(1)
                  SCOPE

WARNING: perfalarm is not active (alarm generator)

This ensures you can proceed with log movement before starting mwa again.

2. Log movement

Datafiles (all starts with log) resides in /var/opt/perf/datafiles directory. List of datafiles is as below :

# ll /var/opt/perf/datafiles/log*
-rw-r--r--   1 root       users      11064908 Jan  1 03:05 /var/opt/perf/datafiles/logappl
-rw-r--r--   1 root       root       43951620 Jan  1 03:05 /var/opt/perf/datafiles/logdev
-rw-r--r--   1 root       users      9556384 Jan  1 03:05 /var/opt/perf/datafiles/logglob
-rw-r--r--   1 root       root         15716 Jan  1 03:01 /var/opt/perf/datafiles/logindx
-rw-r--r--   1 root       users           15 Nov  4  2009 /var/opt/perf/datafiles/logpcmd0
-rw-r--r--   1 root       root       76492020 Jan  1 03:05 /var/opt/perf/datafiles/logproc
-rw-r--r--   1 root       root       96153856 Jan  1 03:05 /var/opt/perf/datafiles/logtran

Now move current data files to a different directory. You can use below small inline scripts to do this or you can manually move them one by one.

# cd /var/opt/perf/datafiles
# nowis=`date +%d%b%y-%H:%M`
# mkdir /var/opt/perf/datafiles.old.`echo $nowis`
# cp log* /var/opt/perf/datafiles.old.`echo $nowis`

Make sure you copied datafiles to the destination correctly and proceed to start services again.

3. Start mwa

Start it using below command :

#  mwa start all

The Perf Agent scope collector is being started.
         The ARM registration daemon
         /opt/perf/bin/ttd has been started.

         The Performance collection daemon
         /opt/perf/bin/scopeux has been started.

         The coda daemon /opt/OV/lbin/perf/coda is already running.
The Perf Agent alarm generator is being started.
         The alarm generator /opt/perf/bin/perfalarm
         has been started.

Observe while shutting down we used three commands for shutting different components but while starting up it came up with the single command. You can check the status with mwa status all command to make sure all components are started. This pretty much sums up how to do a safe and graceful measureware service restart.

All examples on this post are from the machine running HPUX 11.31. Let us know if you have any queries, suggestions, corrections in comments.