scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing
When driving high iop counts, auto_imax coalescing kicks in and drives the
performance to extremely small iops levels.
There are two issues:
1) auto_imax is enabled by default. The auto algorithm, when iops gets
high, divides the iops by the hdwq count and uses that value to
calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
they have load or not. The EQ_delay is only manipulated every 5s (a
long time). Thus there were large 5s swings of no interrupt delay
followed by large/maximum delay, before repeating.
2) When processing a CQ, the driver got mixed up on the rate of when
to ring the doorbell to keep the chip appraised of the eqe or cqe
consumption as well as how how long to sit in the thread and
process queue entries. Currently, the driver capped its work at
64 entries (very small) and exited/rearmed the CQ. Thus, on heavy
loads, additional overheads were taken to exit and re-enter the
interrupt handler. Worse, if in the large/maximum coalescing
windows,k it could be a while before getting back to servicing.
The issues are corrected by the following:
- A change in defaults. Auto_imax is turned OFF and fcp_imax is set
to 0. Thus all interrupts are immediate.
- Cleanup of field names and their meanings. Existing names were
non-intuitive or used for duplicate things.
- Added max_proc_limit field, to control the length of time the
handlers would service completions.
- Reworked EQ handling:
Added common routine that walks eq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after eqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
Replaced the 2 individual loops that walk an eq with a call to the
common routine.
Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
Added per-cpu counters to detect interrupt rates and scale
interrupt coalescing values.
- Reworked CQ handling:
Added common routine that walks cq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after cqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Replaced the 3 individual loops that walk a cq with a call to the
common routine.
Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
queue reference. Add increment for mbox completion to handler.
- Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
dynamic changing of the CQ max_proc_limit value being used.
Although this leaves an EQ as an immediate interrupt, that interrupt will
only occur if a CQ bound to it is in an armed state and has cqe's to
process. By staying in the cq processing routine longer, high loads will
avoid generating more interrupts as they will only rearm as the processing
thread exits. The immediately interrupt is also beneficial to idle or
lower-processing CQ's as they get serviced immediately without being
penalized by sharing an EQ with a more loaded CQ.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 9fd2811..0bc4981 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -686,6 +686,7 @@ struct lpfc_hba {
struct lpfc_sli4_hba sli4_hba;
struct workqueue_struct *wq;
+ struct delayed_work eq_delay_work;
struct lpfc_sli sli;
uint8_t pci_dev_grp; /* lpfc PCI dev group: 0x0, 0x1, 0x2,... */
@@ -789,7 +790,6 @@ struct lpfc_hba {
uint8_t nvmet_support; /* driver supports NVMET */
#define LPFC_NVMET_MAX_PORTS 32
uint8_t mds_diags_support;
- uint32_t initial_imax;
uint8_t bbcredit_support;
uint8_t enab_exp_wqcq_pages;
@@ -817,6 +817,8 @@ struct lpfc_hba {
uint32_t cfg_use_msi;
uint32_t cfg_auto_imax;
uint32_t cfg_fcp_imax;
+ uint32_t cfg_cq_poll_threshold;
+ uint32_t cfg_cq_max_proc_limit;
uint32_t cfg_fcp_cpu_map;
uint32_t cfg_hdw_queue;
uint32_t cfg_irq_chann;
@@ -1084,7 +1086,6 @@ struct lpfc_hba {
uint8_t temp_sensor_support;
/* Fields used for heart beat. */
- unsigned long last_eqdelay_time;
unsigned long last_completion_time;
unsigned long skipped_hb;
struct timer_list hb_tmofunc;
@@ -1287,3 +1288,23 @@ lpfc_phba_elsring(struct lpfc_hba *phba)
}
return &phba->sli.sli3_ring[LPFC_ELS_RING];
}
+
+/**
+ * lpfc_sli4_mod_hba_eq_delay - update EQ delay
+ * @phba: Pointer to HBA context object.
+ * @q: The Event Queue to update.
+ * @delay: The delay value (in us) to be written.
+ *
+ **/
+static inline void
+lpfc_sli4_mod_hba_eq_delay(struct lpfc_hba *phba, struct lpfc_queue *eq,
+ u32 delay)
+{
+ struct lpfc_register reg_data;
+
+ reg_data.word0 = 0;
+ bf_set(lpfc_sliport_eqdelay_id, ®_data, eq->queue_id);
+ bf_set(lpfc_sliport_eqdelay_delay, ®_data, delay);
+ writel(reg_data.word0, phba->sli4_hba.u.if_type2.EQDregaddr);
+ eq->q_mode = delay;
+}