Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 1 | ======================================== |
| 2 | Symmetric Communication Interface (SCIF) |
| 3 | ======================================== |
| 4 | |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 5 | The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low |
| 6 | level communications API across PCIe currently implemented for MIC. Currently |
| 7 | SCIF provides inter-node communication within a single host platform, where a |
| 8 | node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of |
| 9 | communicating over the PCIe bus while providing an API that is symmetric |
| 10 | across all the nodes in the PCIe network. An important design objective for SCIF |
| 11 | is to deliver the maximum possible performance given the communication |
| 12 | abilities of the hardware. SCIF has been used to implement an offload compiler |
| 13 | runtime and OFED support for MPI implementations for MIC coprocessors. |
| 14 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 15 | SCIF API Components |
| 16 | =================== |
| 17 | |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 18 | The SCIF API has the following parts: |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 19 | |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 20 | 1. Connection establishment using a client server model |
| 21 | 2. Byte stream messaging intended for short messages |
| 22 | 3. Node enumeration to determine online nodes |
| 23 | 4. Poll semantics for detection of incoming connections and messages |
| 24 | 5. Memory registration to pin down pages |
| 25 | 6. Remote memory mapping for low latency CPU accesses via mmap |
| 26 | 7. Remote DMA (RDMA) for high bandwidth DMA transfers |
| 27 | 8. Fence APIs for RDMA synchronization |
| 28 | |
| 29 | SCIF exposes the notion of a connection which can be used by peer processes on |
| 30 | nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A |
| 31 | process in a SCIF node initiates a SCIF connection to a peer process on a |
| 32 | different node via a SCIF "endpoint". SCIF endpoints support messaging APIs |
| 33 | which are similar to connection oriented socket APIs. Connected SCIF endpoints |
| 34 | can also register local memory which is followed by data transfer using either |
| 35 | DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and |
| 36 | kernel mode clients which are functionally equivalent. |
| 37 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 38 | SCIF Performance for MIC |
| 39 | ======================== |
| 40 | |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 41 | DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 42 | SCIF shows the performance advantages of SCIF for HPC applications and |
| 43 | runtimes:: |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 44 | |
| 45 | Comparison of TCP and SCIF based BW |
| 46 | |
| 47 | Throughput (GB/sec) |
| 48 | 8 + PCIe Bandwidth ****** |
| 49 | + TCP ###### |
| 50 | 7 + ************************************** SCIF %%%%%% |
| 51 | | %%%%%%%%%%%%%%%%%%% |
| 52 | 6 + %%%% |
| 53 | | %% |
| 54 | | %%% |
| 55 | 5 + %% |
| 56 | | %% |
| 57 | 4 + %% |
| 58 | | %% |
| 59 | 3 + %% |
| 60 | | % |
| 61 | 2 + %% |
| 62 | | %% |
| 63 | | % |
| 64 | 1 + |
| 65 | + ###################################### |
| 66 | 0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+- |
| 67 | 1 10 100 1000 10000 100000 |
| 68 | Transfer Size (KBytes) |
| 69 | |
| 70 | SCIF allows memory sharing via mmap(..) between processes on different PCIe |
| 71 | nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap |
| 72 | latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs. |
| 73 | |
| 74 | SCIF has a user space library which is a thin IOCTL wrapper providing a user |
| 75 | space API similar to the kernel API in scif.h. The SCIF user space library |
| 76 | is distributed @ https://software.intel.com/en-us/mic-developer |
| 77 | |
| 78 | Here is some pseudo code for an example of how two applications on two PCIe |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 79 | nodes would typically use the SCIF API:: |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 80 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 81 | Process A (on node A) Process B (on node B) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 82 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 83 | /* get online node information */ |
| 84 | scif_get_node_ids(..) scif_get_node_ids(..) |
| 85 | scif_open(..) scif_open(..) |
| 86 | scif_bind(..) scif_bind(..) |
| 87 | scif_listen(..) |
| 88 | scif_accept(..) scif_connect(..) |
| 89 | /* SCIF connection established */ |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 90 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 91 | /* Send and receive short messages */ |
| 92 | scif_send(..)/scif_recv(..) scif_send(..)/scif_recv(..) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 93 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 94 | /* Register memory */ |
| 95 | scif_register(..) scif_register(..) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 96 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 97 | /* RDMA */ |
| 98 | scif_readfrom(..)/scif_writeto(..) scif_readfrom(..)/scif_writeto(..) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 99 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 100 | /* Fence DMAs */ |
| 101 | scif_fence_signal(..) scif_fence_signal(..) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 102 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 103 | mmap(..) mmap(..) |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 104 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 105 | /* Access remote registered memory */ |
Sudeep Dutt | 7df20f2 | 2015-04-29 05:32:28 -0700 | [diff] [blame] | 106 | |
Mauro Carvalho Chehab | 09bbf05 | 2019-06-12 14:52:51 -0300 | [diff] [blame^] | 107 | /* Close the endpoints */ |
| 108 | scif_close(..) scif_close(..) |