NFS Tracing(1).txt

(34 KB) Pobierz

NFS Tracing By Passive Network Monitoring

Matt Blaze

Department of Computer Science Princeton University mab@cs.princeton.edu

ABSTRACT

Traces of filesystem activity have proven to be useful for a wide variety of
purposes, rang ing from quantitative analysis of system behavior to
trace-driven simulation of filesystem algo rithms. Such traces can be
difficult to obtain, however, usually entailing modification of the
filesystems to be monitored and runtime overhead for the period of the
trace. Largely because of these difficulties, a surprisingly small number of
filesystem traces have been conducted, and few sample workloads are
available to filesystem researchers.

This paper describes a portable toolkit for deriving approximate traces of
NFS [1] activity by non-intrusively monitoring the Ethernet traffic to and
from the file server. The toolkit uses a promiscuous Ethernet listener
interface (such as the Packetfilter[2]) to read and reconstruct NFS-related
RPC packets intended for the server. It produces traces of the NFS activity
as well as a plausible set of corresponding client system calls. The tool is
currently in use at Princeton and other sites, and is available via
anonymous ftp.

1. Motivation

Traces of real workloads form an important part of virtually all analysis of
computer system behavior, whether it is program hot spots, memory access
patterns, or filesystem activity that is being studied. In the case of
filesystem activity, obtaining useful traces is particularly challenging.
Filesystem behavior can span long time periods, often making it necessary to
collect huge traces over weeks or even months. Modification of the
filesystem to collect trace data is often difficult, and may result in
unacceptable runtime overhead. Distributed filesystems exa cerbate these
difficulties, especially when the network is composed of a large number of
heterogeneous machines. As a result of these difficulties, only a relatively
small number of traces of Unix filesystem workloads have been conducted,
primarily in computing research environments. [3], [4] and [5] are examples
of such traces.

Since distributed filesystems work by transmitting their activity over a
network, it would seem reasonable to obtain traces of such systems by
placing a "tap" on the network and collecting trace data based on the
network traffic. Ethernet[6] based networks lend themselves to this approach
particularly well, since traffic is broadcast to all machines connected to a
given subnetwork. A number of general-purpose network monitoring tools are
avail able that "promiscuously" listen to the Ethernet to which they are
connected; Sun's etherfind[7] is an example of such a tool. While these
tools are useful for observing (and collecting statistics on) specific types
of packets, the information they provide is at too low a level to be useful
for building filesystem traces. Filesystem operations may span several
packets, and may be meaningful only in the context of other, previous
operations.

Some work has been done on characterizing the impact of NFS traffic on
network load. In [8], for example, the results of a study are reported in
which Ethernet traffic was monitored and statistics gathered on NFS
activity. While useful for understanding traffic patterns and developing a
queueing model of NFS loads, these previous stu dies do not use the network
traffic to analyze the file access traffic patterns of the system, focusing
instead on developing a statistical model of the individual packet sources,
destinations, and types.


This paper describes a toolkit for collecting traces of NFS file access
activity by monitoring Ethernet traffic. A "spy" machine with a promiscuous
Ethernet interface is connected to the same network as the file server. Each
NFS-related packet is analyzed and a trace is produced at an appropriate
level of detail. The tool can record the low level NFS calls themselves or
an approximation of the user-level system calls (open, close, etc.) that
triggered the activity.

We partition the problem of deriving NFS activity from raw network traffic
into two fairly distinct subprob lems: that of decoding the low-level NFS
operations from the packets on the network, and that of translating these
low-level commands back into user-level system calls. Hence, the toolkit
consists of two basic parts, an "RPC decoder" (rpcspy) and the "NFS
analyzer" (nfstrace). rpcspy communicates with a low-level network
monitoring facility (such as Sun's NIT [9] or the Packetfilter [2]) to read
and reconstruct the RPC transactions (call and reply) that make up each NFS
command. nfstrace takes the output of rpcspy and reconstructs the sys tem
calls that occurred as well as other interesting data it can derive about
the structure of the filesystem, such as the mappings between NFS file
handles and Unix file names. Since there is not a clean one-to-one mapping
between system calls and lower-level NFS commands, nfstrace uses some simple
heuristics to guess a reasonable approximation of what really occurred.

1.1. A Spy's View of the NFS Protocols

It is well beyond the scope of this paper to describe the protocols used by
NFS; for a detailed description of how NFS works, the reader is referred to
[10], [11], and [12]. What follows is a very brief overview of how NFS
activity translates into Ethernet packets.

An NFS network consists of servers, to which filesystems are physically
connected, and clients, which per form operations on remote server
filesystems as if the disks were locally connected. A particular machine can
be a client or a server or both. Clients mount remote server filesystems in
their local hierarchy just as they do local filesystems; from the user's
perspective, files on NFS and local filesystems are (for the most part)
indistinguishable, and can be manipulated with the usual filesystem calls.

The interface between client and server is defined in terms of 17 remote
procedure call (RPC) operations. Remote files (and directories) are referred
to by a file handle that uniquely identifies the file to the server. There
are operations to read and write bytes of a file (read, write), obtain a
file's attributes (getattr), obtain the contents of directories (lookup,
readdir), create files (create), and so forth. While most of these
operations are direct analogs of Unix system calls, notably absent are open
and close operations; no client state information is maintained at the
server, so there is no need to inform the server explicitly when a file is
in use. Clients can maintain buffer cache entries for NFS files, but must
verify that the blocks are still valid (by checking the last write time with
the getattr operation) before using the cached data.

An RPC transaction consists of a call message (with arguments) from the
client to the server and a reply mes sage (with return data) from the server
to the client. NFS RPC calls are transmitted using the UDP/IP connection
less unreliable datagram protocol[13]. The call message contains a unique
transaction identifier which is included in the reply message to enable the
client to match the reply with its call. The data in both messages is
encoded in an "external data representation" (XDR), which provides a
machine-independent standard for byte order, etc.

Note that the NFS server maintains no state information about its clients,
and knows nothing about the context of each operation outside of the
arguments to the operation itself.

2. The rpcspy Program

rpcspy is the interface to the system-dependent Ethernet monitoring
facility; it produces a trace of the RPC calls issued between a given set of
clients and servers. At present, there are versions of rpcspy for a number
of BSD-derived systems, including ULTRIX (with the Packetfilter[2]), SunOS
(with NIT[9]), and the IBM RT running AOS (with the Stanford enet filter).

For each RPC transaction monitored, rpcspy produces an ASCII record
containing a timestamp, the name of the server, the client, the length of
time the command took to execute, the name of the RPC command executed, and
the command- specific arguments and return data. Currently, rpcspy
understands and can decode the 17 NFS RPC commands, and there are hooks to
allow other RPC services (for example, NIS) to be added reasonably easily.


The output may be read directly or piped into another program (such as
nfstrace) for further analysis; the for mat is designed to be reasonably
friendly to both the human reader and other programs (such as nfstrace or
awk).

Since each RPC transaction consists of two messages, a call and a reply,
rpcspy waits until it receives both these components and emits a single
record for the entire transaction. The basic output format is 8 vertical-bar
separated fields:

timestamp | execution-time | server | client | command-name | arguments |
reply-data

where timestamp is the time the reply message was received, execution-time
is the time (in microseconds) that elapsed between the call and reply,
server is the name (or IP address) of the server, client is the name (or IP
address) of the client followed by the userid that issued the command,
command-name is the name of the particular program invoked (read, write,
getattr, etc.), and arguments and reply-data are the command dependent
arguments and return values passed to and from the RPC program,
respectively.

The exact format of the argument and reply data is dependent on the specific
command issued and the level of detail the user wants logged. For example, a
typical NFS command is recorded as follows:

690529992.167140 | 11717 | paramount | merckx.321 | read |
{"7b1f00000000083c", 0, 8192} | ok, 1871

In this example, uid 321 at client "merckx" issued an NFS read command to
server "paramount". The reply was issued at (Unix time) 690529992.167140
seconds; the call command occurred 11717 microseconds earlier. Three
arguments are logged for the read call: the file handle from which to read
(repr...
Zgłoś jeśli naruszono regulamin