Daniel W. S. Almeida | f8b8d030 | 2020-01-10 20:24:26 -0300 | [diff] [blame] | 1 | =================== |
| 2 | Setting up NFS/RDMA |
| 3 | =================== |
| 4 | |
| 5 | :Author: |
| 6 | NetApp and Open Grid Computing (May 29, 2008) |
| 7 | |
| 8 | .. warning:: |
| 9 | This document is probably obsolete. |
| 10 | |
| 11 | Overview |
| 12 | ======== |
| 13 | |
| 14 | This document describes how to install and setup the Linux NFS/RDMA client |
| 15 | and server software. |
| 16 | |
| 17 | The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server |
| 18 | was first included in the following release, Linux 2.6.25. |
| 19 | |
| 20 | In our testing, we have obtained excellent performance results (full 10Gbit |
| 21 | wire bandwidth at minimal client CPU) under many workloads. The code passes |
| 22 | the full Connectathon test suite and operates over both Infiniband and iWARP |
| 23 | RDMA adapters. |
| 24 | |
| 25 | Getting Help |
| 26 | ============ |
| 27 | |
| 28 | If you get stuck, you can ask questions on the |
| 29 | nfs-rdma-devel@lists.sourceforge.net mailing list. |
| 30 | |
| 31 | Installation |
| 32 | ============ |
| 33 | |
| 34 | These instructions are a step by step guide to building a machine for |
| 35 | use with NFS/RDMA. |
| 36 | |
| 37 | - Install an RDMA device |
| 38 | |
| 39 | Any device supported by the drivers in drivers/infiniband/hw is acceptable. |
| 40 | |
| 41 | Testing has been performed using several Mellanox-based IB cards, the |
| 42 | Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. |
| 43 | |
| 44 | - Install a Linux distribution and tools |
| 45 | |
| 46 | The first kernel release to contain both the NFS/RDMA client and server was |
| 47 | Linux 2.6.25 Therefore, a distribution compatible with this and subsequent |
| 48 | Linux kernel release should be installed. |
| 49 | |
| 50 | The procedures described in this document have been tested with |
| 51 | distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). |
| 52 | |
| 53 | - Install nfs-utils-1.1.2 or greater on the client |
| 54 | |
| 55 | An NFS/RDMA mount point can be obtained by using the mount.nfs command in |
| 56 | nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils |
| 57 | version with support for NFS/RDMA mounts, but for various reasons we |
| 58 | recommend using nfs-utils-1.1.2 or greater). To see which version of |
| 59 | mount.nfs you are using, type: |
| 60 | |
| 61 | .. code-block:: sh |
| 62 | |
| 63 | $ /sbin/mount.nfs -V |
| 64 | |
| 65 | If the version is less than 1.1.2 or the command does not exist, |
| 66 | you should install the latest version of nfs-utils. |
| 67 | |
| 68 | Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs |
| 69 | |
| 70 | Uncompress the package and follow the installation instructions. |
| 71 | |
| 72 | If you will not need the idmapper and gssd executables (you do not need |
| 73 | these to create an NFS/RDMA enabled mount command), the installation |
| 74 | process can be simplified by disabling these features when running |
| 75 | configure: |
| 76 | |
| 77 | .. code-block:: sh |
| 78 | |
| 79 | $ ./configure --disable-gss --disable-nfsv4 |
| 80 | |
| 81 | To build nfs-utils you will need the tcp_wrappers package installed. For |
| 82 | more information on this see the package's README and INSTALL files. |
| 83 | |
| 84 | After building the nfs-utils package, there will be a mount.nfs binary in |
| 85 | the utils/mount directory. This binary can be used to initiate NFS v2, v3, |
| 86 | or v4 mounts. To initiate a v4 mount, the binary must be called |
| 87 | mount.nfs4. The standard technique is to create a symlink called |
| 88 | mount.nfs4 to mount.nfs. |
| 89 | |
| 90 | This mount.nfs binary should be installed at /sbin/mount.nfs as follows: |
| 91 | |
| 92 | .. code-block:: sh |
| 93 | |
| 94 | $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs |
| 95 | |
| 96 | In this location, mount.nfs will be invoked automatically for NFS mounts |
| 97 | by the system mount command. |
| 98 | |
| 99 | .. note:: |
| 100 | mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed |
| 101 | on the NFS client machine. You do not need this specific version of |
| 102 | nfs-utils on the server. Furthermore, only the mount.nfs command from |
| 103 | nfs-utils-1.1.2 is needed on the client. |
| 104 | |
| 105 | - Install a Linux kernel with NFS/RDMA |
| 106 | |
| 107 | The NFS/RDMA client and server are both included in the mainline Linux |
| 108 | kernel version 2.6.25 and later. This and other versions of the Linux |
| 109 | kernel can be found at: https://www.kernel.org/pub/linux/kernel/ |
| 110 | |
| 111 | Download the sources and place them in an appropriate location. |
| 112 | |
| 113 | - Configure the RDMA stack |
| 114 | |
| 115 | Make sure your kernel configuration has RDMA support enabled. Under |
| 116 | Device Drivers -> InfiniBand support, update the kernel configuration |
| 117 | to enable InfiniBand support [NOTE: the option name is misleading. Enabling |
| 118 | InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. |
| 119 | |
| 120 | Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or |
| 121 | iWARP adapter support (amso, cxgb3, etc.). |
| 122 | |
| 123 | If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. |
| 124 | |
| 125 | - Configure the NFS client and server |
| 126 | |
| 127 | Your kernel configuration must also have NFS file system support and/or |
| 128 | NFS server support enabled. These and other NFS related configuration |
| 129 | options can be found under File Systems -> Network File Systems. |
| 130 | |
| 131 | - Build, install, reboot |
| 132 | |
| 133 | The NFS/RDMA code will be enabled automatically if NFS and RDMA |
| 134 | are turned on. The NFS/RDMA client and server are configured via the hidden |
| 135 | SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The |
| 136 | value of SUNRPC_XPRT_RDMA will be: |
| 137 | |
| 138 | #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client |
| 139 | and server will not be built |
| 140 | |
| 141 | #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, |
| 142 | in this case the NFS/RDMA client and server will be built as modules |
| 143 | |
| 144 | #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client |
| 145 | and server will be built into the kernel |
| 146 | |
| 147 | Therefore, if you have followed the steps above and turned no NFS and RDMA, |
| 148 | the NFS/RDMA client and server will be built. |
| 149 | |
| 150 | Build a new kernel, install it, boot it. |
| 151 | |
| 152 | Check RDMA and NFS Setup |
| 153 | ======================== |
| 154 | |
| 155 | Before configuring the NFS/RDMA software, it is a good idea to test |
| 156 | your new kernel to ensure that the kernel is working correctly. |
| 157 | In particular, it is a good idea to verify that the RDMA stack |
| 158 | is functioning as expected and standard NFS over TCP/IP and/or UDP/IP |
| 159 | is working properly. |
| 160 | |
| 161 | - Check RDMA Setup |
| 162 | |
| 163 | If you built the RDMA components as modules, load them at |
| 164 | this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel |
| 165 | card: |
| 166 | |
| 167 | .. code-block:: sh |
| 168 | |
| 169 | $ modprobe ib_mthca |
| 170 | $ modprobe ib_ipoib |
| 171 | |
| 172 | If you are using InfiniBand, make sure there is a Subnet Manager (SM) |
| 173 | running on the network. If your IB switch has an embedded SM, you can |
| 174 | use it. Otherwise, you will need to run an SM, such as OpenSM, on one |
| 175 | of your end nodes. |
| 176 | |
| 177 | If an SM is running on your network, you should see the following: |
| 178 | |
| 179 | .. code-block:: sh |
| 180 | |
| 181 | $ cat /sys/class/infiniband/driverX/ports/1/state |
| 182 | 4: ACTIVE |
| 183 | |
| 184 | where driverX is mthca0, ipath5, ehca3, etc. |
| 185 | |
| 186 | To further test the InfiniBand software stack, use IPoIB (this |
| 187 | assumes you have two IB hosts named host1 and host2): |
| 188 | |
| 189 | .. code-block:: sh |
| 190 | |
| 191 | host1$ ip link set dev ib0 up |
| 192 | host1$ ip address add dev ib0 a.b.c.x |
| 193 | host2$ ip link set dev ib0 up |
| 194 | host2$ ip address add dev ib0 a.b.c.y |
| 195 | host1$ ping a.b.c.y |
| 196 | host2$ ping a.b.c.x |
| 197 | |
| 198 | For other device types, follow the appropriate procedures. |
| 199 | |
| 200 | - Check NFS Setup |
| 201 | |
| 202 | For the NFS components enabled above (client and/or server), |
| 203 | test their functionality over standard Ethernet using TCP/IP or UDP/IP. |
| 204 | |
| 205 | NFS/RDMA Setup |
| 206 | ============== |
| 207 | |
| 208 | We recommend that you use two machines, one to act as the client and |
| 209 | one to act as the server. |
| 210 | |
| 211 | One time configuration: |
| 212 | ----------------------- |
| 213 | |
| 214 | - On the server system, configure the /etc/exports file and start the NFS/RDMA server. |
| 215 | |
| 216 | Exports entries with the following formats have been tested:: |
| 217 | |
| 218 | /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) |
| 219 | /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) |
| 220 | |
| 221 | The IP address(es) is(are) the client's IPoIB address for an InfiniBand |
| 222 | HCA or the client's iWARP address(es) for an RNIC. |
| 223 | |
| 224 | .. note:: |
| 225 | The "insecure" option must be used because the NFS/RDMA client does |
| 226 | not use a reserved port. |
| 227 | |
| 228 | Each time a machine boots: |
| 229 | -------------------------- |
| 230 | |
| 231 | - Load and configure the RDMA drivers |
| 232 | |
| 233 | For InfiniBand using a Mellanox adapter: |
| 234 | |
| 235 | .. code-block:: sh |
| 236 | |
| 237 | $ modprobe ib_mthca |
| 238 | $ modprobe ib_ipoib |
| 239 | $ ip li set dev ib0 up |
| 240 | $ ip addr add dev ib0 a.b.c.d |
| 241 | |
| 242 | .. note:: |
| 243 | Please use unique addresses for the client and server! |
| 244 | |
| 245 | - Start the NFS server |
| 246 | |
| 247 | If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in |
| 248 | kernel config), load the RDMA transport module: |
| 249 | |
| 250 | .. code-block:: sh |
| 251 | |
| 252 | $ modprobe svcrdma |
| 253 | |
| 254 | Regardless of how the server was built (module or built-in), start the |
| 255 | server: |
| 256 | |
| 257 | .. code-block:: sh |
| 258 | |
| 259 | $ /etc/init.d/nfs start |
| 260 | |
| 261 | or |
| 262 | |
| 263 | .. code-block:: sh |
| 264 | |
| 265 | $ service nfs start |
| 266 | |
| 267 | Instruct the server to listen on the RDMA transport: |
| 268 | |
| 269 | .. code-block:: sh |
| 270 | |
| 271 | $ echo rdma 20049 > /proc/fs/nfsd/portlist |
| 272 | |
| 273 | - On the client system |
| 274 | |
| 275 | If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in |
| 276 | kernel config), load the RDMA client module: |
| 277 | |
| 278 | .. code-block:: sh |
| 279 | |
| 280 | $ modprobe xprtrdma.ko |
| 281 | |
| 282 | Regardless of how the client was built (module or built-in), use this |
| 283 | command to mount the NFS/RDMA server: |
| 284 | |
| 285 | .. code-block:: sh |
| 286 | |
| 287 | $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt |
| 288 | |
| 289 | To verify that the mount is using RDMA, run "cat /proc/mounts" and check |
| 290 | the "proto" field for the given mount. |
| 291 | |
| 292 | Congratulations! You're using NFS/RDMA! |