e-domen dot com Blog


Troubleshooting UNIX Systems with Lsof

Posted in Доки by maxim on the Октябрь 18th, 2006

One of the least-talked-about tools in a UNIX sysadmin’s toolkit is lsof. Lsof lists information about files opened by processes. But that’s really an understatement.

Most people forget that, in UNIX, (almost) everything is a file. The OS makes hardware available to applications by way of files in /dev. Kernel, system, memory, device etc. information in made available inside files in /proc. TCP/UDP sockets are sometimes represented internally as files. Even directories are really just files containing other filenames.

Lsof works by examining kernel data-structures and provides a variety of information related to files, pipes, sockets and more.

Lsof is installed by default on most Linux distributions, BSD distributions and OS X. Binary packages for Solaris, AIX, HP-UX, *cough*SCO OpenServer*cough* and many other UNIXes (Unices?) are available on the web.

So, just how useful is lsof?

Deciphering its Output

Switch to root, and type lsof on the commandline.

linux# lsof
COMMAND     PID     USER   FD      TYPE     DEVICE     SIZE       NODE NAME
init          1     root  cwd       DIR       3,65     4096          2 /
init          1     root  rtd       DIR       3,65     4096          2 /
init          1     root  txt       REG       3,65    29556     172317 /sbin/init
init          1     root  mem       REG       3,65  1166880      93908 /lib/libc-2.3.5.so
init          1     root  mem       REG       3,65   103053      93909 /lib/ld-2.3.5.so
init          1     root   10u     FIFO       3,65               48438 /dev/initctl
ksoftirqd     2     root  cwd       DIR       3,65     4096          2 /
ksoftirqd     2     root  rtd       DIR       3,65     4096          2 /
ksoftirqd     2     root  txt   unknown                                /proc/2/exe
events/0      3     root  cwd       DIR       3,65     4096          2 /
events/0      3     root  rtd       DIR       3,65     4096          2 /
events/0      3     root  txt   unknown                                /proc/3/exe

...SNIP...

syslog-ng  6529     root  txt       REG       3,69   114132      84690 /usr/sbin/syslog-ng
syslog-ng  6529     root  mem       REG       3,65  1166880      93908 /lib/libc-2.3.5.so
syslog-ng  6529     root  mem       REG       3,65    64568      93943 /lib/libresolv-2.3.5.so
syslog-ng  6529     root  mem       REG       3,65    75176      93924 /lib/libnsl-2.3.5.so
syslog-ng  6529     root  mem       REG       3,65   103053      93909 /lib/ld-2.3.5.so
syslog-ng  6529     root    0u      CHR        1,3               47320 /dev/null
syslog-ng  6529     root    1u      CHR        1,3               47320 /dev/null
syslog-ng  6529     root    2u      CHR        1,3               47320 /dev/null
syslog-ng  6529     root    3u     unix 0xdea00e00              672127 /dev/log

...SNIP...

asterisk   7001     root   10u     IPv4       6015                 TCP localhost:5038 (LISTEN)
asterisk   7001     root   11r     FIFO       3,70                 306 /var/run/asterisk/autod
ial.ctl
asterisk   7001     root   12u     IPv4       6834                 UDP *:5060
asterisk   7001     root   13r     FIFO        0,5                6019 pipe
asterisk   7001     root   14u     IPv4       6016                 TCP localhost:5038->localho
st:32768 (ESTABLISHED)
asterisk   7001     root   15u     IPv4       6835                 UDP *:2727
asterisk   7001     root   16u     IPv4       6861                 UDP *:4569
asterisk   7001     root   17u      REG       3,70        0     593222 /var/lib/asterisk/astdb
asterisk   7001     root   18r     FIFO        0,5                6883 pipe
asterisk   7001     root   19u      REG       3,70    39402      32066 /var/tmp/iaxy.bin-19098
89093 (deleted)
asterisk   7001     root   20w     FIFO        0,5                6883 pipe

...LOTS MORE SNIPPED...

What you will be presented with is a very long list of open files, which you might want to pipe through your favourite pager.

By default (on Linux), lsof displays the following information about each open file:

  • COMMAND: The name of the UNIX command associated with the process.
  • PID: The Process ID.
  • USER: The user ID or login name of the user to whom the process belongs.
  • FD: The file descriptor number of the file or a code representing more information about the structure. See manual page for details.
  • TYPE: The type of the node associated with the file. E.g. REG signifies a regular file, IPv4 or IPv6 signifies an IP socket, DIR a directory, “unix” a UNIX domain socket, etc.
  • DEVICE: Usually contains major and minor device numbers for the files, or addresses/references for other structures.
  • SIZE: The size of the file or the file offset, in bytes. (If available.) In the case of files that don’t have true sizes (eg., sockets, pipes), lsof displays the size of the content their kernel buffer descriptors.
  • NODE: Node number / inode / Internet protocol type (TCP) etc.
  • NAME: The name of the file / mount point / device / Internet address / etc.

For a comprehensive description of these fields, refer the lsof manual page.

Since lsof works by examining kernel memory, you will need root access to be able to fully utilize it. A non-root user will not have access to information that belongs to other users.

Common Usage

Lsof is usually run with one or more of the following options:

  • /path/to/file: List processes, owners and open file descriptors that are currently using the specified file.
  • -i [46][protocol][@hostname|hostaddr][:service|port]: List Internet files / sockets.
  • -u name: List files owned by user.
  • -p pid: List files open by specified process.
  • -t: Terse output. No headers, only PIDs. Useful within scripts.
  • -n: Disable resolving of network names.
  • -N: List NFS files

These options are ORed by default.

Display all internet files OR files opened by user “foobar”.

# lsof -u foobar -i

To display all internet files that are opened by foobar, you need to apply the AND (-a) condition between the switches.

# lsof -u foobar -a -i

The following recipes demonstrate how lsof can be used to troubleshoot real-world problems.

Recipe #1: Finding Port Hogs

Your web-server is refusing to come up because port 80 is in use by another process. How do you track down the offending process?

# lsof -i

... SNIP ...

asterisk   7554     root   16u  IPv4   6861       UDP *:4569
postmaste  7688 postgres    5u  IPv4   5955       UDP localhost:32768->localhost:32768
postmaste  7689 postgres    5u  IPv4   5955       UDP localhost:32768->localhost:32768
sshd      27038     root    3u  IPv4 677971       TCP reddwarf:ssh->CPE.xxxx.com:61702 (ESTABLISHED)
sshd      27043    mohit    3u  IPv4 677971       TCP reddwarf:ssh->CPE.xxxx.com:61702 (ESTABLISHED)

... SNIP ...

Nice. A list of open Internet sockets, along with the processes, addresses and owners. Also note that (similar to netstat), the TCP states are displayed. Above, we can see two established ssh sessions in progress.

Let’s add a port filter and find exactly what we’re looking for.

# lsof -i TCP:80
COMMAND   PID     USER   FD   TYPE DEVICE SIZE NODE NAME
lighttpd 7356 lighttpd    3u  IPv4   6409       TCP *:http (LISTEN)

Okay, so lighttpd is the reason why Apache won’t run. That’s probably a good thing.

Recipe #2: Finding Processes Within a Given Port Range

You need to find a range of free ports for your new multimedia application.

# lsof -i TCP:5000-5200
COMMAND   PID USER   FD   TYPE DEVICE SIZE NODE NAME
asterisk 7001 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7001 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7002 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7002 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7039 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7039 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7040 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7040 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7041 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7041 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7042 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7042 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7044 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7044 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
perl     7046 root    3u  IPv4   6054       TCP *:5100 (LISTEN)
perl     7046 root    4u  IPv4   6055       TCP *:5101 (LISTEN)
perl     7046 root    6u  IPv4   6056       TCP localhost:32768->localhost:5038 (ESTABLISHED)
asterisk 7073 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7073 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)
asterisk 7504 root   10u  IPv4   6015       TCP localhost:5038 (LISTEN)
asterisk 7504 root   14u  IPv4   6016       TCP localhost:5038->localhost:32768 (ESTABLISHED)

Recipe #3: Listing User Files

What files do users “foobar” and “apache” have open?

# lsof -u foobar,apache

List UDP ports in use by user “mohit”.

# lsof -i UDP -a -u mohit

Who’s responding to “who”?

# lsof -i UDP:who

Recipe #4: Unmounting a Disk or Filesystem

Sometimes you need to track down the user or process that’s blocking you from unmounting a disk.

# umount /opt
umount: /opt: device is busy
umount: /opt: device is busy
# mount | grep "/opt"
/dev/hdb9 on /opt type ext3 (rw,noatime)
# lsof /dev/hdb9
COMMAND    PID     USER   FD   TYPE DEVICE    SIZE    NODE NAME
perl      7046     root    2w   REG   3,73     111 1376386 /opt/local/paynacea/var/state/callmanager.pid.err
perl      7046     root    5w   REG   3,73    6783 1376385 /opt/local/paynacea/var/log/callmanager.log
# kill 7046
# umount /opt

Or the simpler:

# kill `lsof -t /opt`

Recipe #5: Finding Device Hogs

Who’s using the audio manager?

# lsof /dev/audio

Why can’t I start my alternate logger?

# lsof /dev/log

Why doesn’t my CD eject?

# lsof /dev/cdrom

Recipe #6: Using Exclusions

The ‘^’ (negated) modifier can prefix the User or Process ID parameters to exclude them from the resulting list. Since they represent exclusions, they are applied without ORing or ANDing and take effect before any other selection criteria are applied.

List all Internet files/sockets open by non-root users.

# lsof -i -u^root

Recipe #7: Recursing Directories

The ‘+D’ option causes lsof to search for open files within a specified directory, recursing down to its complete depth.

List all processes that have files open in /tmp.

# lsof +D /tmp

The ‘+d’ option does the same thing, but does _not_ descend the directory tree.

Recipe #8: Matching by Process Name

List all files open by processes beginning with the letters mpg.

# lsof -c mpg

Using a regular-expression.

# lsof -c '/post.*er/'

Recipe #9: Examining Suspicious Processes

Lsof can be used along with strace to examine and monitor the operation of viruses, worms or spyware.

What files are opened by PID 14554?

# lsof -p 14554

Who’s looking at the password file?

# lsof /etc/passwd

Recipe #10: Repeat Mode

The -r switch puts lsof in repeat mode. It delays every 15 seconds (unless specified), and displays another listing.

Watching a user’s open files every 5 seconds:

# lsof -u badcop -r5

Monitoring the password file:

# lsof /etc/passwd -r 2

Recipe #11: Finding Deleted Open Files

This recipe was added on 26/Mar/06 after an anonymous poster left a comment regarding deleted files.

One of the most annoying problems is a file-system quickly running out of space, without a hint of what file is responsible for it. This happens when a file (usually a log-file), gets deleted while it’s still being written to. When you delete an open file, the kernel unlinks the file from the directory, but cannot remove the inode, since it’s still open.

This causes the file to continue to grow, with no trace of its existance anywhere. Well… almost anywhere.

Lsof provides the +L parameter to list the number of link counts an open file has. When followed by a number, lsof only displays files with link counts less thatn the specified number.

mohit@reddwarf ~ $ lsof +L3
COMMAND   PID  USER   FD   TYPE DEVICE    SIZE NLINK      NODE NAME
sshd    11540 mohit  mem    REG   3,69  303448     1     85869 /usr/sbin/sshd
sshd    11540 mohit  mem    REG   3,65   35404     1     94075 /lib/libnss_nis-2.3.5.so
sshd    11540 mohit  mem    REG   3,65   30928     1     94086 /lib/libnss_compat-2.3.5.so
sshd    11540 mohit  mem    REG   3,65   35236     1     93958 /lib/libnss_files-2.3.5.so
sshd    11540 mohit  mem    REG   3,65   28444     1     94094 /lib/libcrack.so.2.8.0

A deleted file has zero links. So the following command displays deleted-but-open files on a system.

$ lsof +L1

Display a list of deleted-but-open files within a specific filesystem.

$ lsof +aL1 /tmp

Finally

We barely scratched the surface with the above recipes, but as you can see, lsof is a powerful troubleshooting tool. I’d be interested in learning what other users do with lsof. Toy with it, tinker with it, use it and let me know how it has helped you.

Взято здесь

Comments are closed.