Blame - Documentation/power/suspend-and-cpuhotplug.rst - SHIFTPHONES/mainline/linux

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

1

====================================================================

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

2

Interaction of Suspend code (S3) with the CPU hotplug infrastructure

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

3

====================================================================

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

4

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

5

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

6

7

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

8

I. Differences between CPU hotplug and Suspend-to-RAM

9

======================================================

10

11

How does the regular CPU hotplug code differ from how the Suspend-to-RAM

12

infrastructure uses it internally? And where do they share common code?

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

13

14

Well, a picture is worth a thousand words... So ASCII art follows :-)

15

16

[This depicts the current design in the kernel, and focusses only on the

17

interactions involving the freezer and CPU hotplug and also tries to explain

18

the locking involved. It outlines the notifications involved as well.

19

But please note that here, only the call paths are illustrated, with the aim

20

of describing where they take different paths and where they share code.

21

What happens when regular CPU hotplug and Suspend-to-RAM race with each other

22

is not depicted here.]

23

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

24

On a high level, the suspend-resume cycle goes like this::

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

25

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

26

|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |

27

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

28

29

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

30

More details follow::

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

Suspend call path

-----------------

Write 'mem' to

/sys/power/state

Marcos Paulo de Souza

6237dd1

2012-05-02 14:33:37 +0200

[diff] [blame]

37

sysfs file

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

38

|

39

v

Pingfan Liu

55f2503

2018-07-31 16:51:32 +0800

[diff] [blame]

40

Acquire system_transition_mutex lock

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

41

|

42

v

43

Send PM_SUSPEND_PREPARE

notifications

|

v

Freeze tasks

|

|

v

disable_nonboot_cpus()

/* start */

|

v

Acquire cpu_add_remove_lock

56

|

57

v

58

Iterate over CURRENTLY

online CPUs

|

|

| ----------

v | L

======> _cpu_down() |

65

| [This takes cpuhotplug.lock |

66

Common | before taking down the CPU |

67

code | and releases it when done] | O

68

| While it is at it, notifications |

69

| are sent when notable events occur, |

70

======> by running all registered callbacks. |

| | O

| |

| |

v |

Note down these cpus in | P

76

frozen_cpus mask ----------

77

|

78

v

79

Disable regular cpu hotplug

Vitaly Kuznetsov

89af7ba

2015-08-05 00:52:46 -0700

[diff] [blame]

80

by increasing cpu_hotplug_disabled

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

81

|

82

v

83

Release cpu_add_remove_lock

84

|

85

v

86

/* disable_nonboot_cpus() complete */

|

v

Do suspend

Resuming back is likewise, with the counterparts being (in the order of

94

execution during resume):

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

95

96

* enable_nonboot_cpus() which involves::

97

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

98

| Acquire cpu_add_remove_lock

Vitaly Kuznetsov

89af7ba

2015-08-05 00:52:46 -0700

[diff] [blame]

99

| Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

100

| Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]

101

| Release cpu_add_remove_lock

v

* thaw tasks

* send PM_POST_SUSPEND notifications

Pingfan Liu

55f2503

2018-07-31 16:51:32 +0800

[diff] [blame]

106

* Release system_transition_mutex lock.

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

107

108

Pingfan Liu

55f2503

2018-07-31 16:51:32 +0800

[diff] [blame]

109

It is to be noted here that the system_transition_mutex lock is acquired at the very

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

110

beginning, when we are just starting out to suspend, and then released only

111

after the entire cycle is complete (i.e., suspend + resume).

112

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

113

::

114

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

115

116

117

Regular CPU hotplug call path

118

-----------------------------

119

120

Write 0 (or 1) to

121

/sys/devices/system/cpu/cpu*/online

sysfs file

|

|

v

cpu_down()

|

v

Acquire cpu_add_remove_lock

130

|

131

v

Vitaly Kuznetsov

89af7ba

2015-08-05 00:52:46 -0700

[diff] [blame]

132

If cpu_hotplug_disabled > 0

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

return gracefully

|

|

v

======> _cpu_down()

| [This takes cpuhotplug.lock

139

Common | before taking down the CPU

140

code | and releases it when done]

141

| While it is at it, notifications

142

| are sent when notable events occur,

143

======> by running all registered callbacks.

|

|

v

Release cpu_add_remove_lock

[That's it!, for

regular CPU hotplug]

So, as can be seen from the two diagrams (the parts marked as "Common code"),

154

regular CPU hotplug and the suspend code path converge at the _cpu_down() and

155

_cpu_up() functions. They differ in the arguments passed to these functions,

156

in that during regular CPU hotplug, 0 is passed for the 'tasks_frozen'

157

argument. But during suspend, since the tasks are already frozen by the time

158

the non-boot CPUs are offlined or onlined, the _cpu_*() functions are called

159

with the 'tasks_frozen' argument set to 1.

160

[See below for some known issues regarding this.]

161

162

163

Important files and functions/entry points:

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

164

-------------------------------------------

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

165

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

166

- kernel/power/process.c : freeze_processes(), thaw_processes()

167

- kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()

168

- kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

II. What are the issues involved in CPU hotplug?

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

173

------------------------------------------------

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

174

175

There are some interesting situations involving CPU hotplug and microcode

176

update on the CPUs, as discussed below:

177

178

[Please bear in mind that the kernel requests the microcode images from

179

userspace, using the request_firmware() function defined in

Hans de Goede

df9267f

2018-04-08 18:06:21 +0200

[diff] [blame]

180

drivers/base/firmware_loader/main.c]

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

181

182

183

a. When all the CPUs are identical:

184

185

This is the most common situation and it is quite straightforward: we want

186

to apply the same microcode revision to each of the CPUs.

187

To give an example of x86, the collect_cpu_info() function defined in

188

arch/x86/kernel/microcode_core.c helps in discovering the type of the CPU

189

and thereby in applying the correct microcode revision to it.

190

But note that the kernel does not maintain a common microcode image for the

191

all CPUs, in order to handle case 'b' described below.

192

193

194

b. When some of the CPUs are different than the rest:

195

196

In this case since we probably need to apply different microcode revisions

197

to different CPUs, the kernel maintains a copy of the correct microcode

198

image for each CPU (after appropriate CPU type/model discovery using

199

functions such as collect_cpu_info()).

200

201

202

c. When a CPU is physically hot-unplugged and a new (and possibly different

203

type of) CPU is hot-plugged into the system:

204

205

In the current design of the kernel, whenever a CPU is taken offline during

206

a regular CPU hotplug operation, upon receiving the CPU_DEAD notification

207

(which is sent by the CPU hotplug code), the microcode update driver's

208

callback for that event reacts by freeing the kernel's copy of the

209

microcode image for that CPU.

210

211

Hence, when a new CPU is brought online, since the kernel finds that it

212

doesn't have the microcode image, it does the CPU type/model discovery

213

afresh and then requests the userspace for the appropriate microcode image

214

for that CPU, which is subsequently applied.

215

216

For example, in x86, the mc_cpu_callback() function (which is the microcode

217

update driver's callback registered for CPU hotplug events) calls

218

microcode_update_cpu() which would call microcode_init_cpu() in this case,

219

instead of microcode_resume_cpu() when it finds that the kernel doesn't

220

have a valid microcode image. This ensures that the CPU type/model

221

discovery is performed and the right microcode is applied to the CPU after

222

getting it from userspace.

223

224

225

d. Handling microcode update during suspend/hibernate:

226

227

Strictly speaking, during a CPU hotplug operation which does not involve

228

physically removing or inserting CPUs, the CPUs are not actually powered

229

off during a CPU offline. They are just put to the lowest C-states possible.

230

Hence, in such a case, it is not really necessary to re-apply microcode

231

when the CPUs are brought back online, since they wouldn't have lost the

232

image during the CPU offline operation.

233

234

This is the usual scenario encountered during a resume after a suspend.

235

However, in the case of hibernation, since all the CPUs are completely

236

powered off, during restore it becomes necessary to apply the microcode

237

images to all the CPUs.

238

239

[Note that we don't expect someone to physically pull out nodes and insert

240

nodes with a different type of CPUs in-between a suspend-resume or a

241

hibernate/restore cycle.]

242

243

In the current design of the kernel however, during a CPU offline operation

Thomas Gleixner

f4c09f8

2017-11-13 09:39:01 +0100

[diff] [blame]

244

as part of the suspend/hibernate cycle (cpuhp_tasks_frozen is set),

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

245

the existing copy of microcode image in the kernel is not freed up.

246

And during the CPU online operations (during resume/restore), since the

247

kernel finds that it already has copies of the microcode images for all the

248

CPUs, it just applies them to the CPUs, avoiding any re-discovery of CPU

249

type/model and the need for validating whether the microcode revisions are

250

right for the CPUs or not (due to the above assumption that physical CPU

251

hotplug will not be done in-between suspend/resume or hibernate/restore

252

cycles).

253

254

Mauro Carvalho Chehab

151f4e2

2019-06-13 07:10:36 -0300

[diff] [blame^]

III. Known problems

===================

Are there any known problems when regular CPU hotplug and suspend race

259

with each other?

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

260

261

Yes, they are listed below:

262

263

1. When invoking regular CPU hotplug, the 'tasks_frozen' argument passed to

264

the _cpu_down() and _cpu_up() functions is *always* 0.

265

This might not reflect the true current state of the system, since the

266

tasks could have been frozen by an out-of-band event such as a suspend

Thomas Gleixner

f4c09f8

2017-11-13 09:39:01 +0100

[diff] [blame]

267

operation in progress. Hence, the cpuhp_tasks_frozen variable will not

268

reflect the frozen state and the CPU hotplug callbacks which evaluate

269

that variable might execute the wrong code path.

Srivatsa S. Bhat

7fef9fc

2011-10-19 23:59:05 +0200

[diff] [blame]

270

271

2. If a regular CPU hotplug stress test happens to race with the freezer due

272

to a suspend operation in progress at the same time, then we could hit the

273

situation described below:

274

275

* A regular cpu online operation continues its journey from userspace

276

into the kernel, since the freezing has not yet begun.

277

* Then freezer gets to work and freezes userspace.

278

* If cpu online has not yet completed the microcode update stuff by now,

279

it will now start waiting on the frozen userspace in the

280

TASK_UNINTERRUPTIBLE state, in order to get the microcode image.

281

* Now the freezer continues and tries to freeze the remaining tasks. But

282

due to this wait mentioned above, the freezer won't be able to freeze

283

the cpu online hotplug task and hence freezing of tasks fails.

284

285

As a result of this task freezing failure, the suspend operation gets

286

aborted.