Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 915ae8c..1d65604 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -358,7 +358,8 @@
quilt trees:
- USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
-
+ - x86-64, partly i386, Andi Kleen <ak@suse.de>
+ ftp.firstfloor.org:/pub/ak/x86_64/quilt/
Bug Reporting
-------------
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 99902ae6..7db71d6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1124,11 +1124,15 @@
NMI switch that most IA32 servers have fires unknown NMI up, for example.
If a system hangs up, try pressing the NMI switch.
-[NOTE]
- This function and oprofile share a NMI callback. Therefore this function
- cannot be enabled when oprofile is activated.
- And NMI watchdog will be disabled when the value in this file is set to
- non-zero.
+nmi_watchdog
+------------
+
+Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero
+the NMI watchdog is enabled and will continuously test all online cpus to
+determine whether or not they are still functioning properly.
+
+Because the NMI watchdog shares registers with oprofile, by disabling the NMI
+watchdog, oprofile may have more registers to utilize.
2.4 /proc/sys/vm - The virtual memory subsystem
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index b7d6abb..e2cbd59 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -421,6 +421,11 @@
The second argument is optional, and if supplied will be used
if first argument is not supported.
+ as-instr
+ as-instr checks if the assembler reports a specific instruction
+ and then outputs either option1 or option2
+ C escapes are supported in the test instruction
+
cc-option
cc-option is used to check if $(CC) supports a given option, and not
supported to use an optional second option.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 766abda..c918cc3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1240,7 +1240,11 @@
bootloader. This is currently used on
IXP2000 systems where the bus has to be
configured a certain way for adjunct CPUs.
-
+ noearly [X86] Don't do any early type 1 scanning.
+ This might help on some broken boards which
+ machine check when some devices' config space
+ is read. But various workarounds are disabled
+ and some IOMMU drivers will not work.
pcmv= [HW,PCMCIA] BadgePAD 4
pd. [PARIDE]
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 6da24e7..4303e0c 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -245,6 +245,13 @@
newfallback: use new unwinder but fall back to old if it gets
stuck (default)
+ call_trace=[old|both|newfallback|new]
+ old: use old inexact backtracer
+ new: use new exact dwarf2 unwinder
+ both: print entries from both
+ newfallback: use new unwinder but fall back to old if it gets
+ stuck (default)
+
Misc
noreplacement Don't replace instructions with more appropriate ones
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks
new file mode 100644
index 0000000..bddfddd
--- /dev/null
+++ b/Documentation/x86_64/kernel-stacks
@@ -0,0 +1,99 @@
+Most of the text from Keith Owens, hacked by AK
+
+x86_64 page size (PAGE_SIZE) is 4K.
+
+Like all other architectures, x86_64 has a kernel stack for every
+active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
+These stacks contain useful data as long as a thread is alive or a
+zombie. While the thread is in user space the kernel stack is empty
+except for the thread_info structure at the bottom.
+
+In addition to the per thread stacks, there are specialized stacks
+associated with each cpu. These stacks are only used while the kernel
+is in control on that cpu, when a cpu returns to user space the
+specialized stacks contain no useful data. The main cpu stacks is
+
+* Interrupt stack. IRQSTACKSIZE
+
+ Used for external hardware interrupts. If this is the first external
+ hardware interrupt (i.e. not a nested hardware interrupt) then the
+ kernel switches from the current task to the interrupt stack. Like
+ the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS),
+ this gives more room for kernel interrupt processing without having
+ to increase the size of every per thread stack.
+
+ The interrupt stack is also used when processing a softirq.
+
+Switching to the kernel interrupt stack is done by software based on a
+per CPU interrupt nest counter. This is needed because x86-64 "IST"
+hardware stacks cannot nest without races.
+
+x86_64 also has a feature which is not available on i386, the ability
+to automatically switch to a new stack for designated events such as
+double fault or NMI, which makes it easier to handle these unusual
+events on x86_64. This feature is called the Interrupt Stack Table
+(IST). There can be up to 7 IST entries per cpu. The IST code is an
+index into the Task State Segment (TSS), the IST entries in the TSS
+point to dedicated stacks, each stack can be a different size.
+
+An IST is selected by an non-zero value in the IST field of an
+interrupt-gate descriptor. When an interrupt occurs and the hardware
+loads such a descriptor, the hardware automatically sets the new stack
+pointer based on the IST value, then invokes the interrupt handler. If
+software wants to allow nested IST interrupts then the handler must
+adjust the IST values on entry to and exit from the interrupt handler.
+(this is occasionally done, e.g. for debug exceptions)
+
+Events with different IST codes (i.e. with different stacks) can be
+nested. For example, a debug interrupt can safely be interrupted by an
+NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
+pointers on entry to and exit from all IST events, in theory allowing
+IST events with the same code to be nested. However in most cases, the
+stack size allocated to an IST assumes no nesting for the same code.
+If that assumption is ever broken then the stacks will become corrupt.
+
+The currently assigned IST stacks are :-
+
+* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
+
+ Used for interrupt 12 - Stack Fault Exception (#SS).
+
+ This allows to recover from invalid stack segments. Rarely
+ happens.
+
+* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
+
+ Used for interrupt 8 - Double Fault Exception (#DF).
+
+ Invoked when handling a exception causes another exception. Happens
+ when the kernel is very confused (e.g. kernel stack pointer corrupt)
+ Using a separate stack allows to recover from it well enough in many
+ cases to still output an oops.
+
+* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
+
+ Used for non-maskable interrupts (NMI).
+
+ NMI can be delivered at any time, including when the kernel is in the
+ middle of switching stacks. Using IST for NMI events avoids making
+ assumptions about the previous state of the kernel stack.
+
+* DEBUG_STACK. DEBUG_STKSZ
+
+ Used for hardware debug interrupts (interrupt 1) and for software
+ debug interrupts (INT3).
+
+ When debugging a kernel, debug interrupts (both hardware and
+ software) can occur at any time. Using IST for these interrupts
+ avoids making assumptions about the previous state of the kernel
+ stack.
+
+* MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
+
+ Used for interrupt 18 - Machine Check Exception (#MC).
+
+ MCE can be delivered at any time, including when the kernel is in the
+ middle of switching stacks. Using IST for MCE events avoids making
+ assumptions about the previous state of the kernel stack.
+
+For more details see the Intel IA32 or AMD AMD64 architecture manuals.
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 6189b0c..758044f 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -166,7 +166,6 @@
config X86_GENERICARCH
bool "Generic architecture (Summit, bigsmp, ES7000, default)"
- depends on SMP
help
This option compiles in the Summit, bigsmp, ES7000, default subarchitectures.
It is intended for a generic binary kernel.
@@ -263,7 +262,7 @@
config X86_UP_APIC
bool "Local APIC support on uniprocessors"
- depends on !SMP && !(X86_VISWS || X86_VOYAGER)
+ depends on !SMP && !(X86_VISWS || X86_VOYAGER || X86_GENERICARCH)
help
A local APIC (Advanced Programmable Interrupt Controller) is an
integrated interrupt controller in the CPU. If you have a single-CPU
@@ -288,12 +287,12 @@
config X86_LOCAL_APIC
bool
- depends on X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER)
+ depends on X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER) || X86_GENERICARCH
default y
config X86_IO_APIC
bool
- depends on X86_UP_IOAPIC || (SMP && !(X86_VISWS || X86_VOYAGER))
+ depends on X86_UP_IOAPIC || (SMP && !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH
default y
config X86_VISWS_APIC
@@ -741,8 +740,7 @@
source kernel/Kconfig.hz
config KEXEC
- bool "kexec system call (EXPERIMENTAL)"
- depends on EXPERIMENTAL
+ bool "kexec system call"
help
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
@@ -763,6 +761,13 @@
depends on HIGHMEM
help
Generate crash dump after being started by kexec.
+ This should be normally only set in special crash dump kernels
+ which are loaded in the main kernel with kexec-tools into
+ a specially reserved region and then later executed after
+ a crash by kdump/kexec. The crash dump kernel must be compiled
+ to a memory address not used by the main kernel or BIOS using
+ PHYSICAL_START.
+ For more details see Documentation/kdump/kdump.txt
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
diff --git a/arch/i386/Makefile b/arch/i386/Makefile
index 3e4adb1..7cc0b18 100644
--- a/arch/i386/Makefile
+++ b/arch/i386/Makefile
@@ -46,6 +46,14 @@
# a lot more stack due to the lack of sharing of stacklots:
CFLAGS += $(shell if [ $(call cc-version) -lt 0400 ] ; then echo $(call cc-option,-fno-unit-at-a-time); fi ;)
+# do binutils support CFI?
+cflags-y += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+AFLAGS += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+
+# is .cfi_signal_frame supported too?
+cflags-y += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+AFLAGS += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+
CFLAGS += $(cflags-y)
# Default subarch .c files
diff --git a/arch/i386/boot/edd.S b/arch/i386/boot/edd.S
index 4b84ea2..3432136 100644
--- a/arch/i386/boot/edd.S
+++ b/arch/i386/boot/edd.S
@@ -15,42 +15,95 @@
#include <asm/setup.h>
#if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
+
+# It is assumed that %ds == INITSEG here
+
movb $0, (EDD_MBR_SIG_NR_BUF)
movb $0, (EDDNR)
-# Check the command line for two options:
+# Check the command line for options:
# edd=of disables EDD completely (edd=off)
# edd=sk skips the MBR test (edd=skipmbr)
+# edd=on re-enables EDD (edd=on)
+
pushl %esi
- cmpl $0, %cs:cmd_line_ptr
- jz done_cl
+ movw $edd_mbr_sig_start, %di # Default to edd=on
+
movl %cs:(cmd_line_ptr), %esi
-# ds:esi has the pointer to the command line now
- movl $(COMMAND_LINE_SIZE-7), %ecx
-# loop through kernel command line one byte at a time
-cl_loop:
- cmpl $EDD_CL_EQUALS, (%si)
+ andl %esi, %esi
+ jz old_cl # Old boot protocol?
+
+# Convert to a real-mode pointer in fs:si
+ movl %esi, %eax
+ shrl $4, %eax
+ movw %ax, %fs
+ andw $0xf, %si
+ jmp have_cl_pointer
+
+# Old-style boot protocol?
+old_cl:
+ push %ds # aka INITSEG
+ pop %fs
+
+ cmpw $0xa33f, (0x20)
+ jne done_cl # No command line at all?
+ movw (0x22), %si # Pointer relative to INITSEG
+
+# fs:si has the pointer to the command line now
+have_cl_pointer:
+
+# Loop through kernel command line one byte at a time. Just in
+# case the loader is buggy and failed to null-terminate the command line
+# terminate if we get close enough to the end of the segment that we
+# cannot fit "edd=XX"...
+cl_atspace:
+ cmpw $-5, %si # Watch for segment wraparound
+ jae done_cl
+ movl %fs:(%si), %eax
+ andb %al, %al # End of line?
+ jz done_cl
+ cmpl $EDD_CL_EQUALS, %eax
jz found_edd_equals
- incl %esi
- loop cl_loop
- jmp done_cl
+ cmpb $0x20, %al # <= space consider whitespace
+ ja cl_skipword
+ incw %si
+ jmp cl_atspace
+
+cl_skipword:
+ cmpw $-5, %si # Watch for segment wraparound
+ jae done_cl
+ movb %fs:(%si), %al # End of string?
+ andb %al, %al
+ jz done_cl
+ cmpb $0x20, %al
+ jbe cl_atspace
+ incw %si
+ jmp cl_skipword
+
found_edd_equals:
# only looking at first two characters after equals
- addl $4, %esi
- cmpw $EDD_CL_OFF, (%si) # edd=of
- jz do_edd_off
- cmpw $EDD_CL_SKIP, (%si) # edd=sk
- jz do_edd_skipmbr
- jmp done_cl
+# late overrides early on the command line, so keep going after finding something
+ movw %fs:4(%si), %ax
+ cmpw $EDD_CL_OFF, %ax # edd=of
+ je do_edd_off
+ cmpw $EDD_CL_SKIP, %ax # edd=sk
+ je do_edd_skipmbr
+ cmpw $EDD_CL_ON, %ax # edd=on
+ je do_edd_on
+ jmp cl_skipword
do_edd_skipmbr:
- popl %esi
- jmp edd_start
+ movw $edd_start, %di
+ jmp cl_skipword
do_edd_off:
- popl %esi
- jmp edd_done
+ movw $edd_done, %di
+ jmp cl_skipword
+do_edd_on:
+ movw $edd_mbr_sig_start, %di
+ jmp cl_skipword
+
done_cl:
popl %esi
-
+ jmpw *%di
# Read the first sector of each BIOS disk device and store the 4-byte signature
edd_mbr_sig_start:
diff --git a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
index d2b684c..3aec4538 100644
--- a/arch/i386/boot/setup.S
+++ b/arch/i386/boot/setup.S
@@ -494,12 +494,12 @@
movw %cs, %ax # aka SETUPSEG
subw $DELTA_INITSEG, %ax # aka INITSEG
movw %ax, %ds
- movw $0, (0x1ff) # default is no pointing device
+ movb $0, (0x1ff) # default is no pointing device
int $0x11 # int 0x11: equipment list
testb $0x04, %al # check if mouse installed
jz no_psmouse
- movw $0xAA, (0x1ff) # device present
+ movb $0xAA, (0x1ff) # device present
no_psmouse:
#if defined(CONFIG_X86_SPEEDSTEP_SMI) || defined(CONFIG_X86_SPEEDSTEP_SMI_MODULE)
diff --git a/arch/i386/defconfig b/arch/i386/defconfig
index 89ebb7a..1a29bfa 100644
--- a/arch/i386/defconfig
+++ b/arch/i386/defconfig
@@ -1,41 +1,51 @@
#
# Automatically generated make config: don't edit
+# Linux kernel version: 2.6.18-git5
+# Tue Sep 26 09:30:47 2006
#
CONFIG_X86_32=y
+CONFIG_GENERIC_TIME=y
+CONFIG_LOCKDEP_SUPPORT=y
+CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
+CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
+CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
-CONFIG_BROKEN_ON_SMP=y
+CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
#
# General setup
#
CONFIG_LOCALVERSION=""
-# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
-# CONFIG_POSIX_MQUEUE is not set
+CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
-CONFIG_SYSCTL=y
+# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
+# CONFIG_CPUSETS is not set
+# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
-CONFIG_UID16=y
-CONFIG_VM86=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_EMBEDDED is not set
+CONFIG_UID16=y
+CONFIG_SYSCTL=y
CONFIG_KALLSYMS=y
+CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
@@ -45,11 +55,9 @@
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
-CONFIG_CC_ALIGN_FUNCTIONS=0
-CONFIG_CC_ALIGN_LABELS=0
-CONFIG_CC_ALIGN_LOOPS=0
-CONFIG_CC_ALIGN_JUMPS=0
CONFIG_SLAB=y
+CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set
@@ -60,41 +68,45 @@
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
-CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
+CONFIG_STOP_MACHINE=y
#
# Block layer
#
-# CONFIG_LBD is not set
+CONFIG_LBD=y
+# CONFIG_BLK_DEV_IO_TRACE is not set
+# CONFIG_LSF is not set
#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
-# CONFIG_IOSCHED_AS is not set
-# CONFIG_IOSCHED_DEADLINE is not set
+CONFIG_IOSCHED_AS=y
+CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
-# CONFIG_DEFAULT_AS is not set
+CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
-CONFIG_DEFAULT_CFQ=y
+# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
-CONFIG_DEFAULT_IOSCHED="cfq"
+CONFIG_DEFAULT_IOSCHED="anticipatory"
#
# Processor type and features
#
-CONFIG_X86_PC=y
+CONFIG_SMP=y
+# CONFIG_X86_PC is not set
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
-# CONFIG_X86_GENERICARCH is not set
+CONFIG_X86_GENERICARCH=y
# CONFIG_X86_ES7000 is not set
+CONFIG_X86_CYCLONE_TIMER=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
@@ -102,11 +114,11 @@
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
-# CONFIG_MPENTIUMIII is not set
+CONFIG_MPENTIUMIII=y
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
-CONFIG_MK7=y
+# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
@@ -117,10 +129,10 @@
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
-# CONFIG_X86_GENERIC is not set
+CONFIG_X86_GENERIC=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
-CONFIG_X86_L1_CACHE_SHIFT=6
+CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
@@ -131,26 +143,28 @@
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_USE_3DNOW=y
CONFIG_X86_TSC=y
-# CONFIG_HPET_TIMER is not set
-# CONFIG_SMP is not set
-CONFIG_PREEMPT_NONE=y
-# CONFIG_PREEMPT_VOLUNTARY is not set
+CONFIG_HPET_TIMER=y
+CONFIG_HPET_EMULATE_RTC=y
+CONFIG_NR_CPUS=32
+CONFIG_SCHED_SMT=y
+CONFIG_SCHED_MC=y
+# CONFIG_PREEMPT_NONE is not set
+CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
-CONFIG_X86_UP_APIC=y
-CONFIG_X86_UP_IOAPIC=y
+CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
-# CONFIG_X86_MCE_P4THERMAL is not set
+CONFIG_X86_MCE_P4THERMAL=y
+CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
-# CONFIG_MICROCODE is not set
-# CONFIG_X86_MSR is not set
-# CONFIG_X86_CPUID is not set
+CONFIG_MICROCODE=y
+CONFIG_X86_MSR=y
+CONFIG_X86_CPUID=y
#
# Firmware Drivers
@@ -158,68 +172,67 @@
# CONFIG_EDD is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
-CONFIG_NOHIGHMEM=y
-# CONFIG_HIGHMEM4G is not set
+# CONFIG_NOHIGHMEM is not set
+CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
-CONFIG_VMSPLIT_3G=y
-# CONFIG_VMSPLIT_3G_OPT is not set
-# CONFIG_VMSPLIT_2G is not set
-# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0xC0000000
-CONFIG_ARCH_FLATMEM_ENABLE=y
-CONFIG_ARCH_SPARSEMEM_ENABLE=y
-CONFIG_ARCH_SELECT_MEMORY_MODEL=y
+CONFIG_HIGHMEM=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
-CONFIG_SPARSEMEM_STATIC=y
+# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_RESOURCES_64BIT=y
+# CONFIG_HIGHPTE is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
+# CONFIG_IRQBALANCE is not set
CONFIG_REGPARM=y
-# CONFIG_SECCOMP is not set
-CONFIG_HZ_100=y
-# CONFIG_HZ_250 is not set
+CONFIG_SECCOMP=y
+# CONFIG_HZ_100 is not set
+CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
-CONFIG_HZ=100
+CONFIG_HZ=250
# CONFIG_KEXEC is not set
+# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x100000
-CONFIG_DOUBLEFAULT=y
+# CONFIG_HOTPLUG_CPU is not set
+CONFIG_COMPAT_VDSO=y
+CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
-# CONFIG_PM_LEGACY is not set
+CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
-CONFIG_SOFTWARE_SUSPEND=y
-CONFIG_PM_STD_PARTITION=""
#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
-# CONFIG_ACPI_SLEEP is not set
-# CONFIG_ACPI_AC is not set
-# CONFIG_ACPI_BATTERY is not set
-# CONFIG_ACPI_BUTTON is not set
+CONFIG_ACPI_AC=y
+CONFIG_ACPI_BATTERY=y
+CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_VIDEO is not set
# CONFIG_ACPI_HOTKEY is not set
-# CONFIG_ACPI_FAN is not set
-# CONFIG_ACPI_PROCESSOR is not set
+CONFIG_ACPI_FAN=y
+# CONFIG_ACPI_DOCK is not set
+CONFIG_ACPI_PROCESSOR=y
+CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
-CONFIG_ACPI_BLACKLIST_YEAR=0
-# CONFIG_ACPI_DEBUG is not set
+CONFIG_ACPI_BLACKLIST_YEAR=2001
+CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
-# CONFIG_X86_PM_TIMER is not set
+CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set
#
@@ -230,7 +243,41 @@
#
# CPU Frequency scaling
#
-# CONFIG_CPU_FREQ is not set
+CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_TABLE=y
+CONFIG_CPU_FREQ_DEBUG=y
+CONFIG_CPU_FREQ_STAT=y
+# CONFIG_CPU_FREQ_STAT_DETAILS is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
+# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
+CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
+# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
+CONFIG_CPU_FREQ_GOV_USERSPACE=y
+CONFIG_CPU_FREQ_GOV_ONDEMAND=y
+# CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set
+
+#
+# CPUFreq processor drivers
+#
+CONFIG_X86_ACPI_CPUFREQ=y
+# CONFIG_X86_POWERNOW_K6 is not set
+# CONFIG_X86_POWERNOW_K7 is not set
+CONFIG_X86_POWERNOW_K8=y
+CONFIG_X86_POWERNOW_K8_ACPI=y
+# CONFIG_X86_GX_SUSPMOD is not set
+# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
+# CONFIG_X86_SPEEDSTEP_ICH is not set
+# CONFIG_X86_SPEEDSTEP_SMI is not set
+# CONFIG_X86_P4_CLOCKMOD is not set
+# CONFIG_X86_CPUFREQ_NFORCE2 is not set
+# CONFIG_X86_LONGRUN is not set
+# CONFIG_X86_LONGHAUL is not set
+
+#
+# shared options
+#
+CONFIG_X86_ACPI_CPUFREQ_PROC_INTF=y
+# CONFIG_X86_SPEEDSTEP_LIB is not set
#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
@@ -244,12 +291,13 @@
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
-# CONFIG_PCI_MSI is not set
-# CONFIG_PCI_LEGACY_PROC is not set
+CONFIG_PCI_MSI=y
+# CONFIG_PCI_DEBUG is not set
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set
+CONFIG_K8_NB=y
#
# PCCARD (PCMCIA/CardBus) support
@@ -278,93 +326,54 @@
#
# CONFIG_NETDEBUG is not set
CONFIG_PACKET=y
-CONFIG_PACKET_MMAP=y
+# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
+CONFIG_XFRM=y
+# CONFIG_XFRM_USER is not set
+# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_NET_KEY is not set
CONFIG_INET=y
-# CONFIG_IP_MULTICAST is not set
+CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
-# CONFIG_IP_PNP is not set
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+# CONFIG_IP_PNP_BOOTP is not set
+# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
+# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
+# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
-# CONFIG_INET_DIAG is not set
+CONFIG_INET_XFRM_MODE_TRANSPORT=y
+CONFIG_INET_XFRM_MODE_TUNNEL=y
+CONFIG_INET_DIAG=y
+CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
-CONFIG_TCP_CONG_BIC=y
-
-#
-# IP: Virtual Server Configuration
-#
-# CONFIG_IP_VS is not set
-# CONFIG_IPV6 is not set
-CONFIG_NETFILTER=y
-# CONFIG_NETFILTER_DEBUG is not set
-
-#
-# Core Netfilter Configuration
-#
-# CONFIG_NETFILTER_NETLINK is not set
-CONFIG_NETFILTER_XTABLES=y
-# CONFIG_NETFILTER_XT_TARGET_CLASSIFY is not set
-# CONFIG_NETFILTER_XT_TARGET_MARK is not set
-# CONFIG_NETFILTER_XT_TARGET_NFQUEUE is not set
-# CONFIG_NETFILTER_XT_MATCH_COMMENT is not set
-# CONFIG_NETFILTER_XT_MATCH_CONNTRACK is not set
-# CONFIG_NETFILTER_XT_MATCH_DCCP is not set
-# CONFIG_NETFILTER_XT_MATCH_HELPER is not set
-# CONFIG_NETFILTER_XT_MATCH_LENGTH is not set
-CONFIG_NETFILTER_XT_MATCH_LIMIT=y
-CONFIG_NETFILTER_XT_MATCH_MAC=y
-# CONFIG_NETFILTER_XT_MATCH_MARK is not set
-# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
-# CONFIG_NETFILTER_XT_MATCH_REALM is not set
-# CONFIG_NETFILTER_XT_MATCH_SCTP is not set
-CONFIG_NETFILTER_XT_MATCH_STATE=y
-# CONFIG_NETFILTER_XT_MATCH_STRING is not set
-# CONFIG_NETFILTER_XT_MATCH_TCPMSS is not set
-
-#
-# IP: Netfilter Configuration
-#
-CONFIG_IP_NF_CONNTRACK=y
-# CONFIG_IP_NF_CT_ACCT is not set
-# CONFIG_IP_NF_CONNTRACK_MARK is not set
-# CONFIG_IP_NF_CONNTRACK_EVENTS is not set
-# CONFIG_IP_NF_CT_PROTO_SCTP is not set
-CONFIG_IP_NF_FTP=y
-# CONFIG_IP_NF_IRC is not set
-# CONFIG_IP_NF_NETBIOS_NS is not set
-# CONFIG_IP_NF_TFTP is not set
-# CONFIG_IP_NF_AMANDA is not set
-# CONFIG_IP_NF_PPTP is not set
-# CONFIG_IP_NF_QUEUE is not set
-CONFIG_IP_NF_IPTABLES=y
-# CONFIG_IP_NF_MATCH_IPRANGE is not set
-# CONFIG_IP_NF_MATCH_MULTIPORT is not set
-# CONFIG_IP_NF_MATCH_TOS is not set
-# CONFIG_IP_NF_MATCH_RECENT is not set
-# CONFIG_IP_NF_MATCH_ECN is not set
-# CONFIG_IP_NF_MATCH_DSCP is not set
-# CONFIG_IP_NF_MATCH_AH_ESP is not set
-# CONFIG_IP_NF_MATCH_TTL is not set
-# CONFIG_IP_NF_MATCH_OWNER is not set
-# CONFIG_IP_NF_MATCH_ADDRTYPE is not set
-# CONFIG_IP_NF_MATCH_HASHLIMIT is not set
-CONFIG_IP_NF_FILTER=y
-# CONFIG_IP_NF_TARGET_REJECT is not set
-CONFIG_IP_NF_TARGET_LOG=y
-# CONFIG_IP_NF_TARGET_ULOG is not set
-# CONFIG_IP_NF_TARGET_TCPMSS is not set
-# CONFIG_IP_NF_NAT is not set
-# CONFIG_IP_NF_MANGLE is not set
-# CONFIG_IP_NF_RAW is not set
-# CONFIG_IP_NF_ARPTABLES is not set
+CONFIG_TCP_CONG_CUBIC=y
+CONFIG_DEFAULT_TCP_CONG="cubic"
+CONFIG_IPV6=y
+# CONFIG_IPV6_PRIVACY is not set
+# CONFIG_IPV6_ROUTER_PREF is not set
+# CONFIG_INET6_AH is not set
+# CONFIG_INET6_ESP is not set
+# CONFIG_INET6_IPCOMP is not set
+# CONFIG_IPV6_MIP6 is not set
+# CONFIG_INET6_XFRM_TUNNEL is not set
+# CONFIG_INET6_TUNNEL is not set
+CONFIG_INET6_XFRM_MODE_TRANSPORT=y
+CONFIG_INET6_XFRM_MODE_TUNNEL=y
+# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
+# CONFIG_IPV6_TUNNEL is not set
+# CONFIG_IPV6_SUBTREES is not set
+# CONFIG_IPV6_MULTIPLE_TABLES is not set
+# CONFIG_NETWORK_SECMARK is not set
+# CONFIG_NETFILTER is not set
#
# DCCP Configuration (EXPERIMENTAL)
@@ -389,7 +398,6 @@
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
-# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
@@ -402,6 +410,7 @@
# Network testing
#
# CONFIG_NET_PKTGEN is not set
+# CONFIG_NET_TCPPROBE is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
@@ -416,7 +425,9 @@
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
-# CONFIG_FW_LOADER is not set
+CONFIG_FW_LOADER=y
+# CONFIG_DEBUG_DRIVER is not set
+# CONFIG_SYS_HYPERVISOR is not set
#
# Connector - unified userspace <-> kernelspace linker
@@ -431,13 +442,7 @@
#
# Parallel port support
#
-CONFIG_PARPORT=y
-CONFIG_PARPORT_PC=y
-# CONFIG_PARPORT_SERIAL is not set
-# CONFIG_PARPORT_PC_FIFO is not set
-# CONFIG_PARPORT_PC_SUPERIO is not set
-# CONFIG_PARPORT_GSC is not set
-CONFIG_PARPORT_1284=y
+# CONFIG_PARPORT is not set
#
# Plug and Play support
@@ -447,8 +452,7 @@
#
# Block devices
#
-# CONFIG_BLK_DEV_FD is not set
-# CONFIG_PARIDE is not set
+CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
@@ -459,8 +463,11 @@
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
-# CONFIG_BLK_DEV_RAM is not set
+CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
+CONFIG_BLK_DEV_RAM_SIZE=4096
+CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024
+CONFIG_BLK_DEV_INITRD=y
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
@@ -476,7 +483,7 @@
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
-# CONFIG_IDEDISK_MULTI_MODE is not set
+CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
@@ -486,10 +493,10 @@
#
# IDE chipset support/bugfixes
#
-# CONFIG_IDE_GENERIC is not set
+CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPCI=y
-CONFIG_IDEPCI_SHARE_IRQ=y
+# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
@@ -500,7 +507,7 @@
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
-# CONFIG_BLK_DEV_AMD74XX is not set
+CONFIG_BLK_DEV_AMD74XX=y
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
@@ -511,7 +518,7 @@
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
-# CONFIG_BLK_DEV_PIIX is not set
+CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
@@ -521,7 +528,7 @@
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
-CONFIG_BLK_DEV_VIA82CXXX=y
+# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
@@ -533,6 +540,7 @@
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
+CONFIG_SCSI_NETLINK=y
# CONFIG_SCSI_PROC_FS is not set
#
@@ -541,8 +549,9 @@
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
-# CONFIG_BLK_DEV_SR is not set
-# CONFIG_CHR_DEV_SG is not set
+CONFIG_BLK_DEV_SR=y
+# CONFIG_BLK_DEV_SR_VENDOR is not set
+CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set
#
@@ -553,29 +562,44 @@
# CONFIG_SCSI_LOGGING is not set
#
-# SCSI Transport Attributes
+# SCSI Transports
#
-# CONFIG_SCSI_SPI_ATTRS is not set
-# CONFIG_SCSI_FC_ATTRS is not set
+CONFIG_SCSI_SPI_ATTRS=y
+CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_ATTRS is not set
+# CONFIG_SCSI_SAS_LIBSAS is not set
#
# SCSI low-level drivers
#
# CONFIG_ISCSI_TCP is not set
-# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
+CONFIG_BLK_DEV_3W_XXXX_RAID=y
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
-# CONFIG_SCSI_AIC7XXX is not set
+CONFIG_SCSI_AIC7XXX=y
+CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
+CONFIG_AIC7XXX_RESET_DELAY_MS=5000
+CONFIG_AIC7XXX_DEBUG_ENABLE=y
+CONFIG_AIC7XXX_DEBUG_MASK=0
+CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
# CONFIG_SCSI_AIC7XXX_OLD is not set
-# CONFIG_SCSI_AIC79XX is not set
+CONFIG_SCSI_AIC79XX=y
+CONFIG_AIC79XX_CMDS_PER_DEVICE=32
+CONFIG_AIC79XX_RESET_DELAY_MS=4000
+# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
+# CONFIG_AIC79XX_DEBUG_ENABLE is not set
+CONFIG_AIC79XX_DEBUG_MASK=0
+# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
+# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_DPT_I2O is not set
+# CONFIG_SCSI_ADVANSYS is not set
+# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
-# CONFIG_SCSI_SATA is not set
+# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
@@ -584,11 +608,9 @@
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
-# CONFIG_SCSI_PPA is not set
-# CONFIG_SCSI_IMM is not set
+# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
-# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_LPFC is not set
@@ -598,22 +620,114 @@
# CONFIG_SCSI_DEBUG is not set
#
+# Serial ATA (prod) and Parallel ATA (experimental) drivers
+#
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_SVW=y
+CONFIG_ATA_PIIX=y
+# CONFIG_SATA_MV is not set
+CONFIG_SATA_NV=y
+# CONFIG_PDC_ADMA is not set
+# CONFIG_SATA_QSTOR is not set
+# CONFIG_SATA_PROMISE is not set
+# CONFIG_SATA_SX4 is not set
+CONFIG_SATA_SIL=y
+# CONFIG_SATA_SIL24 is not set
+# CONFIG_SATA_SIS is not set
+# CONFIG_SATA_ULI is not set
+CONFIG_SATA_VIA=y
+# CONFIG_SATA_VITESSE is not set
+CONFIG_SATA_INTEL_COMBINED=y
+# CONFIG_PATA_ALI is not set
+# CONFIG_PATA_AMD is not set
+# CONFIG_PATA_ARTOP is not set
+# CONFIG_PATA_ATIIXP is not set
+# CONFIG_PATA_CMD64X is not set
+# CONFIG_PATA_CS5520 is not set
+# CONFIG_PATA_CS5530 is not set
+# CONFIG_PATA_CS5535 is not set
+# CONFIG_PATA_CYPRESS is not set
+# CONFIG_PATA_EFAR is not set
+# CONFIG_ATA_GENERIC is not set
+# CONFIG_PATA_HPT366 is not set
+# CONFIG_PATA_HPT37X is not set
+# CONFIG_PATA_HPT3X2N is not set
+# CONFIG_PATA_HPT3X3 is not set
+# CONFIG_PATA_IT821X is not set
+# CONFIG_PATA_JMICRON is not set
+# CONFIG_PATA_LEGACY is not set
+# CONFIG_PATA_TRIFLEX is not set
+# CONFIG_PATA_MPIIX is not set
+# CONFIG_PATA_OLDPIIX is not set
+# CONFIG_PATA_NETCELL is not set
+# CONFIG_PATA_NS87410 is not set
+# CONFIG_PATA_OPTI is not set
+# CONFIG_PATA_OPTIDMA is not set
+# CONFIG_PATA_PDC_OLD is not set
+# CONFIG_PATA_QDI is not set
+# CONFIG_PATA_RADISYS is not set
+# CONFIG_PATA_RZ1000 is not set
+# CONFIG_PATA_SC1200 is not set
+# CONFIG_PATA_SERVERWORKS is not set
+# CONFIG_PATA_PDC2027X is not set
+# CONFIG_PATA_SIL680 is not set
+# CONFIG_PATA_SIS is not set
+# CONFIG_PATA_VIA is not set
+# CONFIG_PATA_WINBOND is not set
+
+#
# Multi-device support (RAID and LVM)
#
-# CONFIG_MD is not set
+CONFIG_MD=y
+# CONFIG_BLK_DEV_MD is not set
+CONFIG_BLK_DEV_DM=y
+# CONFIG_DM_CRYPT is not set
+# CONFIG_DM_SNAPSHOT is not set
+# CONFIG_DM_MIRROR is not set
+# CONFIG_DM_ZERO is not set
+# CONFIG_DM_MULTIPATH is not set
#
# Fusion MPT device support
#
-# CONFIG_FUSION is not set
-# CONFIG_FUSION_SPI is not set
+CONFIG_FUSION=y
+CONFIG_FUSION_SPI=y
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set
+CONFIG_FUSION_MAX_SGE=128
+# CONFIG_FUSION_CTL is not set
#
# IEEE 1394 (FireWire) support
#
-# CONFIG_IEEE1394 is not set
+CONFIG_IEEE1394=y
+
+#
+# Subsystem Options
+#
+# CONFIG_IEEE1394_VERBOSEDEBUG is not set
+# CONFIG_IEEE1394_OUI_DB is not set
+# CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set
+# CONFIG_IEEE1394_EXPORT_FULL_API is not set
+
+#
+# Device Drivers
+#
+
+#
+# Texas Instruments PCILynx requires I2C
+#
+CONFIG_IEEE1394_OHCI1394=y
+
+#
+# Protocol Drivers
+#
+# CONFIG_IEEE1394_VIDEO1394 is not set
+# CONFIG_IEEE1394_SBP2 is not set
+# CONFIG_IEEE1394_ETH1394 is not set
+# CONFIG_IEEE1394_DV1394 is not set
+CONFIG_IEEE1394_RAWIO=y
#
# I2O device support
@@ -652,46 +766,63 @@
#
# Tulip family network device support
#
-# CONFIG_NET_TULIP is not set
+CONFIG_NET_TULIP=y
+# CONFIG_DE2104X is not set
+CONFIG_TULIP=y
+# CONFIG_TULIP_MWI is not set
+# CONFIG_TULIP_MMIO is not set
+# CONFIG_TULIP_NAPI is not set
+# CONFIG_DE4X5 is not set
+# CONFIG_WINBOND_840 is not set
+# CONFIG_DM9102 is not set
+# CONFIG_ULI526X is not set
# CONFIG_HP100 is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
-# CONFIG_B44 is not set
-# CONFIG_FORCEDETH is not set
+CONFIG_B44=y
+CONFIG_FORCEDETH=y
+# CONFIG_FORCEDETH_NAPI is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
-# CONFIG_8139CP is not set
-# CONFIG_8139TOO is not set
+CONFIG_8139CP=y
+CONFIG_8139TOO=y
+# CONFIG_8139TOO_PIO is not set
+# CONFIG_8139TOO_TUNE_TWISTER is not set
+# CONFIG_8139TOO_8129 is not set
+# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
-# CONFIG_NET_POCKET is not set
#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
-# CONFIG_E1000 is not set
+CONFIG_E1000=y
+# CONFIG_E1000_NAPI is not set
+# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
-# CONFIG_R8169 is not set
+CONFIG_R8169=y
+# CONFIG_R8169_NAPI is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
-# CONFIG_SKY2 is not set
+CONFIG_SKY2=y
# CONFIG_SK98LIN is not set
# CONFIG_VIA_VELOCITY is not set
-# CONFIG_TIGON3 is not set
-# CONFIG_BNX2 is not set
+CONFIG_TIGON3=y
+CONFIG_BNX2=y
+# CONFIG_QLA3XXX is not set
#
# Ethernet (10000 Mbit)
@@ -699,6 +830,7 @@
# CONFIG_CHELSIO_T1 is not set
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set
+# CONFIG_MYRI10GE is not set
#
# Token Ring devices
@@ -716,14 +848,15 @@
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
-# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
-# CONFIG_NETCONSOLE is not set
-# CONFIG_NETPOLL is not set
-# CONFIG_NET_POLL_CONTROLLER is not set
+CONFIG_NETCONSOLE=y
+CONFIG_NETPOLL=y
+# CONFIG_NETPOLL_RX is not set
+# CONFIG_NETPOLL_TRAP is not set
+CONFIG_NET_POLL_CONTROLLER=y
#
# ISDN subsystem
@@ -745,8 +878,8 @@
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
-CONFIG_INPUT_MOUSEDEV_SCREEN_X=1280
-CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1024
+CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
+CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
@@ -776,7 +909,6 @@
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
-# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
@@ -788,14 +920,15 @@
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
+# CONFIG_VT_HW_CONSOLE_BINDING is not set
# CONFIG_SERIAL_NONSTANDARD is not set
#
# Serial drivers
#
CONFIG_SERIAL_8250=y
-# CONFIG_SERIAL_8250_CONSOLE is not set
-# CONFIG_SERIAL_8250_ACPI is not set
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
@@ -804,14 +937,11 @@
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
+CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
-CONFIG_PRINTER=y
-# CONFIG_LP_CONSOLE is not set
-# CONFIG_PPDEV is not set
-# CONFIG_TIPAR is not set
#
# IPMI
@@ -822,8 +952,12 @@
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
-# CONFIG_HW_RANDOM is not set
-CONFIG_NVRAM=y
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_INTEL=y
+CONFIG_HW_RANDOM_AMD=y
+CONFIG_HW_RANDOM_GEODE=y
+CONFIG_HW_RANDOM_VIA=y
+# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
@@ -833,31 +967,28 @@
#
# Ftape, the floppy tape device driver
#
-# CONFIG_FTAPE is not set
CONFIG_AGP=y
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
-# CONFIG_AGP_AMD64 is not set
-# CONFIG_AGP_INTEL is not set
+CONFIG_AGP_AMD64=y
+CONFIG_AGP_INTEL=y
# CONFIG_AGP_NVIDIA is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_SWORKS is not set
-CONFIG_AGP_VIA=y
+# CONFIG_AGP_VIA is not set
# CONFIG_AGP_EFFICEON is not set
-CONFIG_DRM=y
-# CONFIG_DRM_TDFX is not set
-# CONFIG_DRM_R128 is not set
-CONFIG_DRM_RADEON=y
-# CONFIG_DRM_MGA is not set
-# CONFIG_DRM_SIS is not set
-# CONFIG_DRM_VIA is not set
-# CONFIG_DRM_SAVAGE is not set
+# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
+# CONFIG_PC8736x_GPIO is not set
+# CONFIG_NSC_GPIO is not set
# CONFIG_CS5535_GPIO is not set
-# CONFIG_RAW_DRIVER is not set
-# CONFIG_HPET is not set
-# CONFIG_HANGCHECK_TIMER is not set
+CONFIG_RAW_DRIVER=y
+CONFIG_MAX_RAW_DEVS=256
+CONFIG_HPET=y
+# CONFIG_HPET_RTC_IRQ is not set
+CONFIG_HPET_MMAP=y
+CONFIG_HANGCHECK_TIMER=y
#
# TPM devices
@@ -868,59 +999,7 @@
#
# I2C support
#
-CONFIG_I2C=y
-CONFIG_I2C_CHARDEV=y
-
-#
-# I2C Algorithms
-#
-CONFIG_I2C_ALGOBIT=y
-# CONFIG_I2C_ALGOPCF is not set
-# CONFIG_I2C_ALGOPCA is not set
-
-#
-# I2C Hardware Bus support
-#
-# CONFIG_I2C_ALI1535 is not set
-# CONFIG_I2C_ALI1563 is not set
-# CONFIG_I2C_ALI15X3 is not set
-# CONFIG_I2C_AMD756 is not set
-# CONFIG_I2C_AMD8111 is not set
-# CONFIG_I2C_I801 is not set
-# CONFIG_I2C_I810 is not set
-# CONFIG_I2C_PIIX4 is not set
-CONFIG_I2C_ISA=y
-# CONFIG_I2C_NFORCE2 is not set
-# CONFIG_I2C_PARPORT is not set
-# CONFIG_I2C_PARPORT_LIGHT is not set
-# CONFIG_I2C_PROSAVAGE is not set
-# CONFIG_I2C_SAVAGE4 is not set
-# CONFIG_SCx200_ACB is not set
-# CONFIG_I2C_SIS5595 is not set
-# CONFIG_I2C_SIS630 is not set
-# CONFIG_I2C_SIS96X is not set
-# CONFIG_I2C_STUB is not set
-# CONFIG_I2C_VIA is not set
-CONFIG_I2C_VIAPRO=y
-# CONFIG_I2C_VOODOO3 is not set
-# CONFIG_I2C_PCA_ISA is not set
-
-#
-# Miscellaneous I2C Chip support
-#
-# CONFIG_SENSORS_DS1337 is not set
-# CONFIG_SENSORS_DS1374 is not set
-# CONFIG_SENSORS_EEPROM is not set
-# CONFIG_SENSORS_PCF8574 is not set
-# CONFIG_SENSORS_PCA9539 is not set
-# CONFIG_SENSORS_PCF8591 is not set
-# CONFIG_SENSORS_RTC8564 is not set
-# CONFIG_SENSORS_MAX6875 is not set
-# CONFIG_RTC_X1205_I2C is not set
-# CONFIG_I2C_DEBUG_CORE is not set
-# CONFIG_I2C_DEBUG_ALGO is not set
-# CONFIG_I2C_DEBUG_BUS is not set
-# CONFIG_I2C_DEBUG_CHIP is not set
+# CONFIG_I2C is not set
#
# SPI support
@@ -931,51 +1010,12 @@
#
# Dallas's 1-wire bus
#
-# CONFIG_W1 is not set
#
# Hardware Monitoring support
#
-CONFIG_HWMON=y
-CONFIG_HWMON_VID=y
-# CONFIG_SENSORS_ADM1021 is not set
-# CONFIG_SENSORS_ADM1025 is not set
-# CONFIG_SENSORS_ADM1026 is not set
-# CONFIG_SENSORS_ADM1031 is not set
-# CONFIG_SENSORS_ADM9240 is not set
-# CONFIG_SENSORS_ASB100 is not set
-# CONFIG_SENSORS_ATXP1 is not set
-# CONFIG_SENSORS_DS1621 is not set
-# CONFIG_SENSORS_F71805F is not set
-# CONFIG_SENSORS_FSCHER is not set
-# CONFIG_SENSORS_FSCPOS is not set
-# CONFIG_SENSORS_GL518SM is not set
-# CONFIG_SENSORS_GL520SM is not set
-CONFIG_SENSORS_IT87=y
-# CONFIG_SENSORS_LM63 is not set
-# CONFIG_SENSORS_LM75 is not set
-# CONFIG_SENSORS_LM77 is not set
-# CONFIG_SENSORS_LM78 is not set
-# CONFIG_SENSORS_LM80 is not set
-# CONFIG_SENSORS_LM83 is not set
-# CONFIG_SENSORS_LM85 is not set
-# CONFIG_SENSORS_LM87 is not set
-# CONFIG_SENSORS_LM90 is not set
-# CONFIG_SENSORS_LM92 is not set
-# CONFIG_SENSORS_MAX1619 is not set
-# CONFIG_SENSORS_PC87360 is not set
-# CONFIG_SENSORS_SIS5595 is not set
-# CONFIG_SENSORS_SMSC47M1 is not set
-# CONFIG_SENSORS_SMSC47B397 is not set
-# CONFIG_SENSORS_VIA686A is not set
-# CONFIG_SENSORS_VT8231 is not set
-# CONFIG_SENSORS_W83781D is not set
-# CONFIG_SENSORS_W83792D is not set
-# CONFIG_SENSORS_W83L785TS is not set
-# CONFIG_SENSORS_W83627HF is not set
-# CONFIG_SENSORS_W83627EHF is not set
-# CONFIG_SENSORS_HDAPS is not set
-# CONFIG_HWMON_DEBUG_CHIP is not set
+# CONFIG_HWMON is not set
+# CONFIG_HWMON_VID is not set
#
# Misc devices
@@ -983,117 +1023,31 @@
# CONFIG_IBM_ASM is not set
#
-# Multimedia Capabilities Port drivers
-#
-
-#
# Multimedia devices
#
-CONFIG_VIDEO_DEV=y
-
-#
-# Video For Linux
-#
-
-#
-# Video Adapters
-#
-# CONFIG_VIDEO_ADV_DEBUG is not set
-# CONFIG_VIDEO_BT848 is not set
-# CONFIG_VIDEO_BWQCAM is not set
-# CONFIG_VIDEO_CQCAM is not set
-# CONFIG_VIDEO_W9966 is not set
-# CONFIG_VIDEO_CPIA is not set
-# CONFIG_VIDEO_SAA5246A is not set
-# CONFIG_VIDEO_SAA5249 is not set
-# CONFIG_TUNER_3036 is not set
-# CONFIG_VIDEO_STRADIS is not set
-# CONFIG_VIDEO_ZORAN is not set
-CONFIG_VIDEO_SAA7134=y
-# CONFIG_VIDEO_SAA7134_ALSA is not set
-# CONFIG_VIDEO_MXB is not set
-# CONFIG_VIDEO_DPC is not set
-# CONFIG_VIDEO_HEXIUM_ORION is not set
-# CONFIG_VIDEO_HEXIUM_GEMINI is not set
-# CONFIG_VIDEO_CX88 is not set
-# CONFIG_VIDEO_EM28XX is not set
-# CONFIG_VIDEO_OVCAMCHIP is not set
-# CONFIG_VIDEO_AUDIO_DECODER is not set
-# CONFIG_VIDEO_DECODER is not set
-
-#
-# Radio Adapters
-#
-# CONFIG_RADIO_GEMTEK_PCI is not set
-# CONFIG_RADIO_MAXIRADIO is not set
-# CONFIG_RADIO_MAESTRO is not set
+# CONFIG_VIDEO_DEV is not set
+CONFIG_VIDEO_V4L2=y
#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set
-CONFIG_VIDEO_TUNER=y
-CONFIG_VIDEO_BUF=y
-CONFIG_VIDEO_IR=y
+# CONFIG_USB_DABUSB is not set
#
# Graphics support
#
-CONFIG_FB=y
-CONFIG_FB_CFB_FILLRECT=y
-CONFIG_FB_CFB_COPYAREA=y
-CONFIG_FB_CFB_IMAGEBLIT=y
-# CONFIG_FB_MACMODES is not set
-CONFIG_FB_MODE_HELPERS=y
-# CONFIG_FB_TILEBLITTING is not set
-# CONFIG_FB_CIRRUS is not set
-# CONFIG_FB_PM2 is not set
-# CONFIG_FB_CYBER2000 is not set
-# CONFIG_FB_ARC is not set
-# CONFIG_FB_ASILIANT is not set
-# CONFIG_FB_IMSTT is not set
-# CONFIG_FB_VGA16 is not set
-# CONFIG_FB_VESA is not set
-CONFIG_VIDEO_SELECT=y
-# CONFIG_FB_HGA is not set
-# CONFIG_FB_S1D13XXX is not set
-# CONFIG_FB_NVIDIA is not set
-# CONFIG_FB_RIVA is not set
-# CONFIG_FB_I810 is not set
-# CONFIG_FB_INTEL is not set
-# CONFIG_FB_MATROX is not set
-# CONFIG_FB_RADEON_OLD is not set
-CONFIG_FB_RADEON=y
-CONFIG_FB_RADEON_I2C=y
-# CONFIG_FB_RADEON_DEBUG is not set
-# CONFIG_FB_ATY128 is not set
-# CONFIG_FB_ATY is not set
-# CONFIG_FB_SAVAGE is not set
-# CONFIG_FB_SIS is not set
-# CONFIG_FB_NEOMAGIC is not set
-# CONFIG_FB_KYRO is not set
-# CONFIG_FB_3DFX is not set
-# CONFIG_FB_VOODOO1 is not set
-# CONFIG_FB_CYBLA is not set
-# CONFIG_FB_TRIDENT is not set
-# CONFIG_FB_GEODE is not set
-# CONFIG_FB_VIRTUAL is not set
+CONFIG_FIRMWARE_EDID=y
+# CONFIG_FB is not set
#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
+CONFIG_VGACON_SOFT_SCROLLBACK=y
+CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=128
+CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
-CONFIG_FRAMEBUFFER_CONSOLE=y
-# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
-# CONFIG_FONTS is not set
-CONFIG_FONT_8x8=y
-CONFIG_FONT_8x16=y
-
-#
-# Logo configuration
-#
-# CONFIG_LOGO is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
#
@@ -1104,97 +1058,30 @@
#
# Advanced Linux Sound Architecture
#
-CONFIG_SND=y
-CONFIG_SND_TIMER=y
-CONFIG_SND_PCM=y
-CONFIG_SND_RAWMIDI=y
-CONFIG_SND_SEQUENCER=y
-# CONFIG_SND_SEQ_DUMMY is not set
-# CONFIG_SND_MIXER_OSS is not set
-# CONFIG_SND_PCM_OSS is not set
-# CONFIG_SND_SEQUENCER_OSS is not set
-CONFIG_SND_RTCTIMER=y
-CONFIG_SND_SEQ_RTCTIMER_DEFAULT=y
-# CONFIG_SND_DYNAMIC_MINORS is not set
-# CONFIG_SND_SUPPORT_OLD_API is not set
-# CONFIG_SND_VERBOSE_PRINTK is not set
-# CONFIG_SND_DEBUG is not set
-
-#
-# Generic devices
-#
-CONFIG_SND_MPU401_UART=y
-CONFIG_SND_AC97_CODEC=y
-CONFIG_SND_AC97_BUS=y
-# CONFIG_SND_DUMMY is not set
-# CONFIG_SND_VIRMIDI is not set
-# CONFIG_SND_MTPAV is not set
-# CONFIG_SND_SERIAL_U16550 is not set
-# CONFIG_SND_MPU401 is not set
-
-#
-# PCI devices
-#
-# CONFIG_SND_AD1889 is not set
-# CONFIG_SND_ALS4000 is not set
-# CONFIG_SND_ALI5451 is not set
-# CONFIG_SND_ATIIXP is not set
-# CONFIG_SND_ATIIXP_MODEM is not set
-# CONFIG_SND_AU8810 is not set
-# CONFIG_SND_AU8820 is not set
-# CONFIG_SND_AU8830 is not set
-# CONFIG_SND_AZT3328 is not set
-# CONFIG_SND_BT87X is not set
-# CONFIG_SND_CA0106 is not set
-# CONFIG_SND_CMIPCI is not set
-# CONFIG_SND_CS4281 is not set
-# CONFIG_SND_CS46XX is not set
-# CONFIG_SND_CS5535AUDIO is not set
-# CONFIG_SND_EMU10K1 is not set
-# CONFIG_SND_EMU10K1X is not set
-# CONFIG_SND_ENS1370 is not set
-# CONFIG_SND_ENS1371 is not set
-# CONFIG_SND_ES1938 is not set
-# CONFIG_SND_ES1968 is not set
-# CONFIG_SND_FM801 is not set
-# CONFIG_SND_HDA_INTEL is not set
-# CONFIG_SND_HDSP is not set
-# CONFIG_SND_HDSPM is not set
-# CONFIG_SND_ICE1712 is not set
-# CONFIG_SND_ICE1724 is not set
-# CONFIG_SND_INTEL8X0 is not set
-# CONFIG_SND_INTEL8X0M is not set
-# CONFIG_SND_KORG1212 is not set
-# CONFIG_SND_MAESTRO3 is not set
-# CONFIG_SND_MIXART is not set
-# CONFIG_SND_NM256 is not set
-# CONFIG_SND_PCXHR is not set
-# CONFIG_SND_RME32 is not set
-# CONFIG_SND_RME96 is not set
-# CONFIG_SND_RME9652 is not set
-# CONFIG_SND_SONICVIBES is not set
-# CONFIG_SND_TRIDENT is not set
-CONFIG_SND_VIA82XX=y
-# CONFIG_SND_VIA82XX_MODEM is not set
-# CONFIG_SND_VX222 is not set
-# CONFIG_SND_YMFPCI is not set
-
-#
-# USB devices
-#
-# CONFIG_SND_USB_AUDIO is not set
-# CONFIG_SND_USB_USX2Y is not set
+# CONFIG_SND is not set
#
# Open Sound System
#
-# CONFIG_SOUND_PRIME is not set
+CONFIG_SOUND_PRIME=y
+CONFIG_OSS_OBSOLETE_DRIVER=y
+# CONFIG_SOUND_BT878 is not set
+# CONFIG_SOUND_EMU10K1 is not set
+# CONFIG_SOUND_FUSION is not set
+# CONFIG_SOUND_ES1371 is not set
+CONFIG_SOUND_ICH=y
+# CONFIG_SOUND_TRIDENT is not set
+# CONFIG_SOUND_MSNDCLAS is not set
+# CONFIG_SOUND_MSNDPIN is not set
+# CONFIG_SOUND_VIA82CXXX is not set
+# CONFIG_SOUND_OSS is not set
#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
+CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
@@ -1213,17 +1100,19 @@
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_SPLIT_ISO is not set
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
+# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_ISP116X_HCD is not set
-# CONFIG_USB_OHCI_HCD is not set
+CONFIG_USB_OHCI_HCD=y
+# CONFIG_USB_OHCI_BIG_ENDIAN is not set
+CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
#
# USB Device Class drivers
#
-# CONFIG_OBSOLETE_OSS_USB_DRIVER is not set
# CONFIG_USB_ACM is not set
-# CONFIG_USB_PRINTER is not set
+CONFIG_USB_PRINTER=y
#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
@@ -1248,21 +1137,17 @@
#
# USB Input Devices
#
-# CONFIG_USB_HID is not set
-
-#
-# USB HID Boot Protocol drivers
-#
-# CONFIG_USB_KBD is not set
-# CONFIG_USB_MOUSE is not set
+CONFIG_USB_HID=y
+CONFIG_USB_HIDINPUT=y
+# CONFIG_USB_HIDINPUT_POWERBOOK is not set
+# CONFIG_HID_FF is not set
+# CONFIG_USB_HIDDEV is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_ACECAD is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
-# CONFIG_USB_MTOUCH is not set
-# CONFIG_USB_ITMTOUCH is not set
-# CONFIG_USB_EGALAX is not set
+# CONFIG_USB_TOUCHSCREEN is not set
# CONFIG_USB_YEALINK is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set
@@ -1277,21 +1162,6 @@
# CONFIG_USB_MICROTEK is not set
#
-# USB Multimedia devices
-#
-# CONFIG_USB_DABUSB is not set
-# CONFIG_USB_VICAM is not set
-# CONFIG_USB_DSBR is not set
-# CONFIG_USB_ET61X251 is not set
-# CONFIG_USB_IBMCAM is not set
-# CONFIG_USB_KONICAWC is not set
-# CONFIG_USB_OV511 is not set
-# CONFIG_USB_SE401 is not set
-# CONFIG_USB_SN9C102 is not set
-# CONFIG_USB_STV680 is not set
-# CONFIG_USB_PWC is not set
-
-#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
@@ -1299,12 +1169,11 @@
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
-# CONFIG_USB_MON is not set
+CONFIG_USB_MON=y
#
# USB port drivers
#
-# CONFIG_USB_USS720 is not set
#
# USB Serial Converter support
@@ -1321,10 +1190,12 @@
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
+# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
+# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TEST is not set
@@ -1344,56 +1215,96 @@
# CONFIG_MMC is not set
#
+# LED devices
+#
+# CONFIG_NEW_LEDS is not set
+
+#
+# LED drivers
+#
+
+#
+# LED Triggers
+#
+
+#
# InfiniBand support
#
# CONFIG_INFINIBAND is not set
#
-# SN Devices
+# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
+#
+# CONFIG_EDAC is not set
+
+#
+# Real Time Clock
+#
+# CONFIG_RTC_CLASS is not set
+
+#
+# DMA Engine support
+#
+# CONFIG_DMA_ENGINE is not set
+
+#
+# DMA Clients
#
#
-# EDAC - error detection and reporting (RAS)
+# DMA Devices
#
-# CONFIG_EDAC is not set
#
# File systems
#
CONFIG_EXT2_FS=y
-# CONFIG_EXT2_FS_XATTR is not set
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_EXT2_FS_POSIX_ACL=y
+# CONFIG_EXT2_FS_SECURITY is not set
# CONFIG_EXT2_FS_XIP is not set
-# CONFIG_EXT3_FS is not set
-# CONFIG_REISERFS_FS is not set
+CONFIG_EXT3_FS=y
+CONFIG_EXT3_FS_XATTR=y
+CONFIG_EXT3_FS_POSIX_ACL=y
+# CONFIG_EXT3_FS_SECURITY is not set
+CONFIG_JBD=y
+# CONFIG_JBD_DEBUG is not set
+CONFIG_FS_MBCACHE=y
+CONFIG_REISERFS_FS=y
+# CONFIG_REISERFS_CHECK is not set
+# CONFIG_REISERFS_PROC_INFO is not set
+CONFIG_REISERFS_FS_XATTR=y
+CONFIG_REISERFS_FS_POSIX_ACL=y
+# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_JFS_FS is not set
-# CONFIG_FS_POSIX_ACL is not set
+CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
-# CONFIG_INOTIFY is not set
+CONFIG_INOTIFY=y
+CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
-# CONFIG_AUTOFS4_FS is not set
+CONFIG_AUTOFS4_FS=y
# CONFIG_FUSE_FS is not set
#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
-CONFIG_JOLIET=y
-CONFIG_ZISOFS=y
-CONFIG_ZISOFS_FS=y
+# CONFIG_JOLIET is not set
+# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set
#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
-# CONFIG_MSDOS_FS is not set
+CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
-CONFIG_FAT_DEFAULT_CODEPAGE=850
+CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set
@@ -1404,10 +1315,9 @@
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
-# CONFIG_HUGETLBFS is not set
-# CONFIG_HUGETLB_PAGE is not set
+CONFIG_HUGETLBFS=y
+CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y
-# CONFIG_RELAYFS_FS is not set
# CONFIG_CONFIGFS_FS is not set
#
@@ -1430,13 +1340,26 @@
#
# Network File Systems
#
-# CONFIG_NFS_FS is not set
-# CONFIG_NFSD is not set
+CONFIG_NFS_FS=y
+CONFIG_NFS_V3=y
+# CONFIG_NFS_V3_ACL is not set
+# CONFIG_NFS_V4 is not set
+# CONFIG_NFS_DIRECTIO is not set
+CONFIG_NFSD=y
+CONFIG_NFSD_V3=y
+# CONFIG_NFSD_V3_ACL is not set
+# CONFIG_NFSD_V4 is not set
+CONFIG_NFSD_TCP=y
+CONFIG_ROOT_NFS=y
+CONFIG_LOCKD=y
+CONFIG_LOCKD_V4=y
+CONFIG_EXPORTFS=y
+CONFIG_NFS_COMMON=y
+CONFIG_SUNRPC=y
+# CONFIG_RPCSEC_GSS_KRB5 is not set
+# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
-CONFIG_CIFS=y
-# CONFIG_CIFS_STATS is not set
-# CONFIG_CIFS_XATTR is not set
-# CONFIG_CIFS_EXPERIMENTAL is not set
+# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
@@ -1445,33 +1368,18 @@
#
# Partition Types
#
-CONFIG_PARTITION_ADVANCED=y
-# CONFIG_ACORN_PARTITION is not set
-# CONFIG_OSF_PARTITION is not set
-# CONFIG_AMIGA_PARTITION is not set
-# CONFIG_ATARI_PARTITION is not set
-# CONFIG_MAC_PARTITION is not set
+# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
-# CONFIG_BSD_DISKLABEL is not set
-# CONFIG_MINIX_SUBPARTITION is not set
-# CONFIG_SOLARIS_X86_PARTITION is not set
-# CONFIG_UNIXWARE_DISKLABEL is not set
-# CONFIG_LDM_PARTITION is not set
-# CONFIG_SGI_PARTITION is not set
-# CONFIG_ULTRIX_PARTITION is not set
-# CONFIG_SUN_PARTITION is not set
-# CONFIG_KARMA_PARTITION is not set
-# CONFIG_EFI_PARTITION is not set
#
# Native Language Support
#
CONFIG_NLS=y
-CONFIG_NLS_DEFAULT="iso8859-15"
-# CONFIG_NLS_CODEPAGE_437 is not set
+CONFIG_NLS_DEFAULT="iso8859-1"
+CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
-CONFIG_NLS_CODEPAGE_850=y
+# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
@@ -1491,7 +1399,7 @@
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
-# CONFIG_NLS_ASCII is not set
+CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
@@ -1510,20 +1418,50 @@
#
# Instrumentation Support
#
-# CONFIG_PROFILING is not set
-# CONFIG_KPROBES is not set
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=y
+CONFIG_KPROBES=y
#
# Kernel hacking
#
+CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_MAGIC_SYSRQ=y
-# CONFIG_DEBUG_KERNEL is not set
-CONFIG_LOG_BUF_SHIFT=14
+CONFIG_UNUSED_SYMBOLS=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_LOG_BUF_SHIFT=18
+CONFIG_DETECT_SOFTLOCKUP=y
+# CONFIG_SCHEDSTATS is not set
+# CONFIG_DEBUG_SLAB is not set
+# CONFIG_DEBUG_RT_MUTEXES is not set
+# CONFIG_RT_MUTEX_TESTER is not set
+# CONFIG_DEBUG_SPINLOCK is not set
+# CONFIG_DEBUG_MUTEXES is not set
+# CONFIG_DEBUG_RWSEMS is not set
+# CONFIG_DEBUG_LOCK_ALLOC is not set
+# CONFIG_PROVE_LOCKING is not set
+# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
+# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
+# CONFIG_DEBUG_KOBJECT is not set
+# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_DEBUG_BUGVERBOSE=y
+# CONFIG_DEBUG_INFO is not set
+# CONFIG_DEBUG_FS is not set
+# CONFIG_DEBUG_VM is not set
+# CONFIG_FRAME_POINTER is not set
+CONFIG_UNWIND_INFO=y
+CONFIG_STACK_UNWIND=y
+# CONFIG_FORCED_INLINING is not set
+# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_EARLY_PRINTK=y
+CONFIG_DEBUG_STACKOVERFLOW=y
+# CONFIG_DEBUG_STACK_USAGE is not set
+# CONFIG_DEBUG_RODATA is not set
+# CONFIG_4KSTACKS is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
+CONFIG_DOUBLEFAULT=y
#
# Security options
@@ -1537,10 +1475,6 @@
# CONFIG_CRYPTO is not set
#
-# Hardware crypto devices
-#
-
-#
# Library routines
#
# CONFIG_CRC_CCITT is not set
@@ -1548,7 +1482,12 @@
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
+CONFIG_PLIST=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_X86_SMP=y
+CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
+CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 5427a84..1a884b6 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -4,7 +4,7 @@
extra-y := head.o init_task.o vmlinux.lds
-obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o \
+obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o bootflag.o \
quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
@@ -81,4 +81,5 @@
$(call if_changed,syscall)
k8-y += ../../x86_64/kernel/k8.o
+stacktrace-y += ../../x86_64/kernel/stacktrace.o
diff --git a/arch/i386/kernel/acpi/Makefile b/arch/i386/kernel/acpi/Makefile
index 7e9ac99..7f7be01 100644
--- a/arch/i386/kernel/acpi/Makefile
+++ b/arch/i386/kernel/acpi/Makefile
@@ -1,5 +1,7 @@
obj-$(CONFIG_ACPI) += boot.o
+ifneq ($(CONFIG_PCI),)
obj-$(CONFIG_X86_IO_APIC) += earlyquirk.o
+endif
obj-$(CONFIG_ACPI_SLEEP) += sleep.o wakeup.o
ifneq ($(CONFIG_ACPI_PROCESSOR),)
diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c
index ee003bc..1aaea6a 100644
--- a/arch/i386/kernel/acpi/boot.c
+++ b/arch/i386/kernel/acpi/boot.c
@@ -26,9 +26,12 @@
#include <linux/init.h>
#include <linux/acpi.h>
#include <linux/efi.h>
+#include <linux/cpumask.h>
#include <linux/module.h>
#include <linux/dmi.h>
#include <linux/irq.h>
+#include <linux/bootmem.h>
+#include <linux/ioport.h>
#include <asm/pgtable.h>
#include <asm/io_apic.h>
@@ -36,11 +39,17 @@
#include <asm/io.h>
#include <asm/mpspec.h>
+static int __initdata acpi_force = 0;
+
+#ifdef CONFIG_ACPI
+int acpi_disabled = 0;
+#else
+int acpi_disabled = 1;
+#endif
+EXPORT_SYMBOL(acpi_disabled);
+
#ifdef CONFIG_X86_64
-extern void __init clustered_apic_check(void);
-
-extern int gsi_irq_sharing(int gsi);
#include <asm/proto.h>
static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id) { return 0; }
@@ -506,16 +515,76 @@
#ifdef CONFIG_ACPI_HOTPLUG_CPU
int acpi_map_lsapic(acpi_handle handle, int *pcpu)
{
- /* TBD */
- return -EINVAL;
+ struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+ union acpi_object *obj;
+ struct acpi_table_lapic *lapic;
+ cpumask_t tmp_map, new_map;
+ u8 physid;
+ int cpu;
+
+ if (ACPI_FAILURE(acpi_evaluate_object(handle, "_MAT", NULL, &buffer)))
+ return -EINVAL;
+
+ if (!buffer.length || !buffer.pointer)
+ return -EINVAL;
+
+ obj = buffer.pointer;
+ if (obj->type != ACPI_TYPE_BUFFER ||
+ obj->buffer.length < sizeof(*lapic)) {
+ kfree(buffer.pointer);
+ return -EINVAL;
+ }
+
+ lapic = (struct acpi_table_lapic *)obj->buffer.pointer;
+
+ if ((lapic->header.type != ACPI_MADT_LAPIC) ||
+ (!lapic->flags.enabled)) {
+ kfree(buffer.pointer);
+ return -EINVAL;
+ }
+
+ physid = lapic->id;
+
+ kfree(buffer.pointer);
+ buffer.length = ACPI_ALLOCATE_BUFFER;
+ buffer.pointer = NULL;
+
+ tmp_map = cpu_present_map;
+ mp_register_lapic(physid, lapic->flags.enabled);
+
+ /*
+ * If mp_register_lapic successfully generates a new logical cpu
+ * number, then the following will get us exactly what was mapped
+ */
+ cpus_andnot(new_map, cpu_present_map, tmp_map);
+ if (cpus_empty(new_map)) {
+ printk ("Unable to map lapic to logical cpu number\n");
+ return -EINVAL;
+ }
+
+ cpu = first_cpu(new_map);
+
+ *pcpu = cpu;
+ return 0;
}
EXPORT_SYMBOL(acpi_map_lsapic);
int acpi_unmap_lsapic(int cpu)
{
- /* TBD */
- return -EINVAL;
+ int i;
+
+ for_each_possible_cpu(i) {
+ if (x86_acpiid_to_apicid[i] == x86_cpu_to_apicid[cpu]) {
+ x86_acpiid_to_apicid[i] = -1;
+ break;
+ }
+ }
+ x86_cpu_to_apicid[cpu] = -1;
+ cpu_clear(cpu, cpu_present_map);
+ num_processors--;
+
+ return (0);
}
EXPORT_SYMBOL(acpi_unmap_lsapic);
@@ -579,6 +648,8 @@
static int __init acpi_parse_hpet(unsigned long phys, unsigned long size)
{
struct acpi_table_hpet *hpet_tbl;
+ struct resource *hpet_res;
+ resource_size_t res_start;
if (!phys || !size)
return -EINVAL;
@@ -594,12 +665,26 @@
"memory.\n");
return -1;
}
+
+#define HPET_RESOURCE_NAME_SIZE 9
+ hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+ if (hpet_res) {
+ memset(hpet_res, 0, sizeof(*hpet_res));
+ hpet_res->name = (void *)&hpet_res[1];
+ hpet_res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE,
+ "HPET %u", hpet_tbl->number);
+ hpet_res->end = (1 * 1024) - 1;
+ }
+
#ifdef CONFIG_X86_64
vxtime.hpet_address = hpet_tbl->addr.addrl |
((long)hpet_tbl->addr.addrh << 32);
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
hpet_tbl->id, vxtime.hpet_address);
+
+ res_start = vxtime.hpet_address;
#else /* X86 */
{
extern unsigned long hpet_address;
@@ -607,9 +692,17 @@
hpet_address = hpet_tbl->addr.addrl;
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
hpet_tbl->id, hpet_address);
+
+ res_start = hpet_address;
}
#endif /* X86 */
+ if (hpet_res) {
+ hpet_res->start = res_start;
+ hpet_res->end += res_start;
+ insert_resource(&iomem_resource, hpet_res);
+ }
+
return 0;
}
#else
@@ -860,8 +953,6 @@
return;
}
-extern int acpi_force;
-
#ifdef __i386__
static int __init disable_acpi_irq(struct dmi_system_id *d)
@@ -1163,3 +1254,75 @@
return 0;
}
+
+static int __init parse_acpi(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ /* "acpi=off" disables both ACPI table parsing and interpreter */
+ if (strcmp(arg, "off") == 0) {
+ disable_acpi();
+ }
+ /* acpi=force to over-ride black-list */
+ else if (strcmp(arg, "force") == 0) {
+ acpi_force = 1;
+ acpi_ht = 1;
+ acpi_disabled = 0;
+ }
+ /* acpi=strict disables out-of-spec workarounds */
+ else if (strcmp(arg, "strict") == 0) {
+ acpi_strict = 1;
+ }
+ /* Limit ACPI just to boot-time to enable HT */
+ else if (strcmp(arg, "ht") == 0) {
+ if (!acpi_force)
+ disable_acpi();
+ acpi_ht = 1;
+ }
+ /* "acpi=noirq" disables ACPI interrupt routing */
+ else if (strcmp(arg, "noirq") == 0) {
+ acpi_noirq_set();
+ } else {
+ /* Core will printk when we return error. */
+ return -EINVAL;
+ }
+ return 0;
+}
+early_param("acpi", parse_acpi);
+
+/* FIXME: Using pci= for an ACPI parameter is a travesty. */
+static int __init parse_pci(char *arg)
+{
+ if (arg && strcmp(arg, "noacpi") == 0)
+ acpi_disable_pci();
+ return 0;
+}
+early_param("pci", parse_pci);
+
+#ifdef CONFIG_X86_IO_APIC
+static int __init parse_acpi_skip_timer_override(char *arg)
+{
+ acpi_skip_timer_override = 1;
+ return 0;
+}
+early_param("acpi_skip_timer_override", parse_acpi_skip_timer_override);
+#endif /* CONFIG_X86_IO_APIC */
+
+static int __init setup_acpi_sci(char *s)
+{
+ if (!s)
+ return -EINVAL;
+ if (!strcmp(s, "edge"))
+ acpi_sci_flags.trigger = 1;
+ else if (!strcmp(s, "level"))
+ acpi_sci_flags.trigger = 3;
+ else if (!strcmp(s, "high"))
+ acpi_sci_flags.polarity = 1;
+ else if (!strcmp(s, "low"))
+ acpi_sci_flags.polarity = 3;
+ else
+ return -EINVAL;
+ return 0;
+}
+early_param("acpi_sci", setup_acpi_sci);
diff --git a/arch/i386/kernel/acpi/earlyquirk.c b/arch/i386/kernel/acpi/earlyquirk.c
index 1649a17..fe799b1 100644
--- a/arch/i386/kernel/acpi/earlyquirk.c
+++ b/arch/i386/kernel/acpi/earlyquirk.c
@@ -48,7 +48,11 @@
int num, slot, func;
/* Assume the machine supports type 1. If not it will
- always read ffffffff and should not have any side effect. */
+ always read ffffffff and should not have any side effect.
+ Actually a few buggy systems can machine check. Allow the user
+ to disable it by command line option at least -AK */
+ if (!early_pci_allowed())
+ return;
/* Poor man's PCI discovery */
for (num = 0; num < 32; num++) {
diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c
index 8c844d0..90faae5 100644
--- a/arch/i386/kernel/apic.c
+++ b/arch/i386/kernel/apic.c
@@ -52,7 +52,18 @@
/*
* Knob to control our willingness to enable the local APIC.
*/
-int enable_local_apic __initdata = 0; /* -1=force-disable, +1=force-enable */
+static int enable_local_apic __initdata = 0; /* -1=force-disable, +1=force-enable */
+
+static inline void lapic_disable(void)
+{
+ enable_local_apic = -1;
+ clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+}
+
+static inline void lapic_enable(void)
+{
+ enable_local_apic = 1;
+}
/*
* Debug level
@@ -586,8 +597,7 @@
printk("No ESR for 82489DX.\n");
}
- if (nmi_watchdog == NMI_LOCAL_APIC)
- setup_apic_nmi_watchdog();
+ setup_apic_nmi_watchdog(NULL);
apic_pm_activate();
}
@@ -1373,3 +1383,18 @@
return 0;
}
+
+static int __init parse_lapic(char *arg)
+{
+ lapic_enable();
+ return 0;
+}
+early_param("lapic", parse_lapic);
+
+static int __init parse_nolapic(char *arg)
+{
+ lapic_disable();
+ return 0;
+}
+early_param("nolapic", parse_nolapic);
+
diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c
index e6a2d6b..e475809 100644
--- a/arch/i386/kernel/cpu/amd.c
+++ b/arch/i386/kernel/cpu/amd.c
@@ -22,7 +22,7 @@
extern void vide(void);
__asm__(".align 4\nvide: ret");
-static void __init init_amd(struct cpuinfo_x86 *c)
+static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
u32 l, h;
int mbytes = num_physpages >> (20-PAGE_SHIFT);
@@ -246,7 +246,7 @@
num_cache_leaves = 3;
}
-static unsigned int amd_size_cache(struct cpuinfo_x86 * c, unsigned int size)
+static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned int size)
{
/* AMD errata T13 (order #21922) */
if ((c->x86 == 6)) {
@@ -259,7 +259,7 @@
return size;
}
-static struct cpu_dev amd_cpu_dev __initdata = {
+static struct cpu_dev amd_cpu_dev __cpuinitdata = {
.c_vendor = "AMD",
.c_ident = { "AuthenticAMD" },
.c_models = {
@@ -275,7 +275,6 @@
},
},
.c_init = init_amd,
- .c_identify = generic_identify,
.c_size_cache = amd_size_cache,
};
diff --git a/arch/i386/kernel/cpu/centaur.c b/arch/i386/kernel/cpu/centaur.c
index bd75629..8c25047 100644
--- a/arch/i386/kernel/cpu/centaur.c
+++ b/arch/i386/kernel/cpu/centaur.c
@@ -9,7 +9,7 @@
#ifdef CONFIG_X86_OOSTORE
-static u32 __init power2(u32 x)
+static u32 __cpuinit power2(u32 x)
{
u32 s=1;
while(s<=x)
@@ -22,7 +22,7 @@
* Set up an actual MCR
*/
-static void __init centaur_mcr_insert(int reg, u32 base, u32 size, int key)
+static void __cpuinit centaur_mcr_insert(int reg, u32 base, u32 size, int key)
{
u32 lo, hi;
@@ -40,7 +40,7 @@
* Shortcut: We know you can't put 4Gig of RAM on a winchip
*/
-static u32 __init ramtop(void) /* 16388 */
+static u32 __cpuinit ramtop(void) /* 16388 */
{
int i;
u32 top = 0;
@@ -91,7 +91,7 @@
* Compute a set of MCR's to give maximum coverage
*/
-static int __init centaur_mcr_compute(int nr, int key)
+static int __cpuinit centaur_mcr_compute(int nr, int key)
{
u32 mem = ramtop();
u32 root = power2(mem);
@@ -166,7 +166,7 @@
return ct;
}
-static void __init centaur_create_optimal_mcr(void)
+static void __cpuinit centaur_create_optimal_mcr(void)
{
int i;
/*
@@ -189,7 +189,7 @@
wrmsr(MSR_IDT_MCR0+i, 0, 0);
}
-static void __init winchip2_create_optimal_mcr(void)
+static void __cpuinit winchip2_create_optimal_mcr(void)
{
u32 lo, hi;
int i;
@@ -227,7 +227,7 @@
* Handle the MCR key on the Winchip 2.
*/
-static void __init winchip2_unprotect_mcr(void)
+static void __cpuinit winchip2_unprotect_mcr(void)
{
u32 lo, hi;
u32 key;
@@ -239,7 +239,7 @@
wrmsr(MSR_IDT_MCR_CTRL, lo, hi);
}
-static void __init winchip2_protect_mcr(void)
+static void __cpuinit winchip2_protect_mcr(void)
{
u32 lo, hi;
@@ -257,7 +257,7 @@
#define RNG_ENABLED (1 << 3)
#define RNG_ENABLE (1 << 6) /* MSR_VIA_RNG */
-static void __init init_c3(struct cpuinfo_x86 *c)
+static void __cpuinit init_c3(struct cpuinfo_x86 *c)
{
u32 lo, hi;
@@ -303,7 +303,7 @@
display_cacheinfo(c);
}
-static void __init init_centaur(struct cpuinfo_x86 *c)
+static void __cpuinit init_centaur(struct cpuinfo_x86 *c)
{
enum {
ECX8=1<<1,
@@ -442,7 +442,7 @@
}
}
-static unsigned int centaur_size_cache(struct cpuinfo_x86 * c, unsigned int size)
+static unsigned int __cpuinit centaur_size_cache(struct cpuinfo_x86 * c, unsigned int size)
{
/* VIA C3 CPUs (670-68F) need further shifting. */
if ((c->x86 == 6) && ((c->x86_model == 7) || (c->x86_model == 8)))
@@ -457,7 +457,7 @@
return size;
}
-static struct cpu_dev centaur_cpu_dev __initdata = {
+static struct cpu_dev centaur_cpu_dev __cpuinitdata = {
.c_vendor = "Centaur",
.c_ident = { "CentaurHauls" },
.c_init = init_centaur,
diff --git a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
index 70c87de..2799baa 100644
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -36,7 +36,7 @@
extern int disable_pse;
-static void default_init(struct cpuinfo_x86 * c)
+static void __cpuinit default_init(struct cpuinfo_x86 * c)
{
/* Not much we can do here... */
/* Check if at least it has cpuid */
@@ -49,7 +49,7 @@
}
}
-static struct cpu_dev default_cpu = {
+static struct cpu_dev __cpuinitdata default_cpu = {
.c_init = default_init,
.c_vendor = "Unknown",
};
@@ -265,7 +265,7 @@
}
}
-void __cpuinit generic_identify(struct cpuinfo_x86 * c)
+static void __cpuinit generic_identify(struct cpuinfo_x86 * c)
{
u32 tfms, xlvl;
int ebx;
@@ -675,7 +675,7 @@
#endif
/* Clear %fs and %gs. */
- asm volatile ("xorl %eax, %eax; movl %eax, %fs; movl %eax, %gs");
+ asm volatile ("movl %0, %%fs; movl %0, %%gs" : : "r" (0));
/* Clear all 6 debug registers: */
set_debugreg(0, 0);
diff --git a/arch/i386/kernel/cpu/cpu.h b/arch/i386/kernel/cpu/cpu.h
index 5a1d4f1..2f6432c 100644
--- a/arch/i386/kernel/cpu/cpu.h
+++ b/arch/i386/kernel/cpu/cpu.h
@@ -24,7 +24,5 @@
extern int get_model_name(struct cpuinfo_x86 *c);
extern void display_cacheinfo(struct cpuinfo_x86 *c);
-extern void generic_identify(struct cpuinfo_x86 * c);
-
extern void early_intel_workaround(struct cpuinfo_x86 *c);
diff --git a/arch/i386/kernel/cpu/cyrix.c b/arch/i386/kernel/cpu/cyrix.c
index f03b7f94..c0c3b59 100644
--- a/arch/i386/kernel/cpu/cyrix.c
+++ b/arch/i386/kernel/cpu/cyrix.c
@@ -12,7 +12,7 @@
/*
* Read NSC/Cyrix DEVID registers (DIR) to get more detailed info. about the CPU
*/
-static void __init do_cyrix_devid(unsigned char *dir0, unsigned char *dir1)
+static void __cpuinit do_cyrix_devid(unsigned char *dir0, unsigned char *dir1)
{
unsigned char ccr2, ccr3;
unsigned long flags;
@@ -52,25 +52,25 @@
* Actually since bugs.h doesn't even reference this perhaps someone should
* fix the documentation ???
*/
-static unsigned char Cx86_dir0_msb __initdata = 0;
+static unsigned char Cx86_dir0_msb __cpuinitdata = 0;
-static char Cx86_model[][9] __initdata = {
+static char Cx86_model[][9] __cpuinitdata = {
"Cx486", "Cx486", "5x86 ", "6x86", "MediaGX ", "6x86MX ",
"M II ", "Unknown"
};
-static char Cx486_name[][5] __initdata = {
+static char Cx486_name[][5] __cpuinitdata = {
"SLC", "DLC", "SLC2", "DLC2", "SRx", "DRx",
"SRx2", "DRx2"
};
-static char Cx486S_name[][4] __initdata = {
+static char Cx486S_name[][4] __cpuinitdata = {
"S", "S2", "Se", "S2e"
};
-static char Cx486D_name[][4] __initdata = {
+static char Cx486D_name[][4] __cpuinitdata = {
"DX", "DX2", "?", "?", "?", "DX4"
};
-static char Cx86_cb[] __initdata = "?.5x Core/Bus Clock";
-static char cyrix_model_mult1[] __initdata = "12??43";
-static char cyrix_model_mult2[] __initdata = "12233445";
+static char Cx86_cb[] __cpuinitdata = "?.5x Core/Bus Clock";
+static char cyrix_model_mult1[] __cpuinitdata = "12??43";
+static char cyrix_model_mult2[] __cpuinitdata = "12233445";
/*
* Reset the slow-loop (SLOP) bit on the 686(L) which is set by some old
@@ -82,7 +82,7 @@
extern void calibrate_delay(void) __init;
-static void __init check_cx686_slop(struct cpuinfo_x86 *c)
+static void __cpuinit check_cx686_slop(struct cpuinfo_x86 *c)
{
unsigned long flags;
@@ -107,7 +107,7 @@
}
-static void __init set_cx86_reorder(void)
+static void __cpuinit set_cx86_reorder(void)
{
u8 ccr3;
@@ -122,7 +122,7 @@
setCx86(CX86_CCR3, ccr3);
}
-static void __init set_cx86_memwb(void)
+static void __cpuinit set_cx86_memwb(void)
{
u32 cr0;
@@ -137,7 +137,7 @@
setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x14 );
}
-static void __init set_cx86_inc(void)
+static void __cpuinit set_cx86_inc(void)
{
unsigned char ccr3;
@@ -158,7 +158,7 @@
* Configure later MediaGX and/or Geode processor.
*/
-static void __init geode_configure(void)
+static void __cpuinit geode_configure(void)
{
unsigned long flags;
u8 ccr3, ccr4;
@@ -184,14 +184,14 @@
#ifdef CONFIG_PCI
-static struct pci_device_id __initdata cyrix_55x0[] = {
+static struct pci_device_id __cpuinitdata cyrix_55x0[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5510) },
{ PCI_DEVICE(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5520) },
{ },
};
#endif
-static void __init init_cyrix(struct cpuinfo_x86 *c)
+static void __cpuinit init_cyrix(struct cpuinfo_x86 *c)
{
unsigned char dir0, dir0_msn, dir0_lsn, dir1 = 0;
char *buf = c->x86_model_id;
@@ -346,7 +346,7 @@
/*
* Handle National Semiconductor branded processors
*/
-static void __init init_nsc(struct cpuinfo_x86 *c)
+static void __cpuinit init_nsc(struct cpuinfo_x86 *c)
{
/* There may be GX1 processors in the wild that are branded
* NSC and not Cyrix.
@@ -394,7 +394,7 @@
return (unsigned char) (test >> 8) == 0x02;
}
-static void cyrix_identify(struct cpuinfo_x86 * c)
+static void __cpuinit cyrix_identify(struct cpuinfo_x86 * c)
{
/* Detect Cyrix with disabled CPUID */
if ( c->x86 == 4 && test_cyrix_52div() ) {
@@ -427,10 +427,9 @@
local_irq_restore(flags);
}
}
- generic_identify(c);
}
-static struct cpu_dev cyrix_cpu_dev __initdata = {
+static struct cpu_dev cyrix_cpu_dev __cpuinitdata = {
.c_vendor = "Cyrix",
.c_ident = { "CyrixInstead" },
.c_init = init_cyrix,
@@ -453,11 +452,10 @@
late_initcall(cyrix_exit_cpu);
-static struct cpu_dev nsc_cpu_dev __initdata = {
+static struct cpu_dev nsc_cpu_dev __cpuinitdata = {
.c_vendor = "NSC",
.c_ident = { "Geode by NSC" },
.c_init = init_nsc,
- .c_identify = generic_identify,
};
int __init nsc_init_cpu(void)
diff --git a/arch/i386/kernel/cpu/intel.c b/arch/i386/kernel/cpu/intel.c
index 5a2e270..94a95aa 100644
--- a/arch/i386/kernel/cpu/intel.c
+++ b/arch/i386/kernel/cpu/intel.c
@@ -198,7 +198,7 @@
}
-static unsigned int intel_size_cache(struct cpuinfo_x86 * c, unsigned int size)
+static unsigned int __cpuinit intel_size_cache(struct cpuinfo_x86 * c, unsigned int size)
{
/* Intel PIII Tualatin. This comes in two flavours.
* One has 256kb of cache, the other 512. We have no way
@@ -263,7 +263,6 @@
},
},
.c_init = init_intel,
- .c_identify = generic_identify,
.c_size_cache = intel_size_cache,
};
diff --git a/arch/i386/kernel/cpu/mcheck/Makefile b/arch/i386/kernel/cpu/mcheck/Makefile
index 30808f3..f1ebe1c 100644
--- a/arch/i386/kernel/cpu/mcheck/Makefile
+++ b/arch/i386/kernel/cpu/mcheck/Makefile
@@ -1,2 +1,2 @@
-obj-y = mce.o k7.o p4.o p5.o p6.o winchip.o
+obj-y = mce.o k7.o p4.o p5.o p6.o winchip.o therm_throt.o
obj-$(CONFIG_X86_MCE_NONFATAL) += non-fatal.o
diff --git a/arch/i386/kernel/cpu/mcheck/p4.c b/arch/i386/kernel/cpu/mcheck/p4.c
index b95f1b3d..504434a 100644
--- a/arch/i386/kernel/cpu/mcheck/p4.c
+++ b/arch/i386/kernel/cpu/mcheck/p4.c
@@ -13,6 +13,8 @@
#include <asm/msr.h>
#include <asm/apic.h>
+#include <asm/therm_throt.h>
+
#include "mce.h"
/* as supported by the P4/Xeon family */
@@ -44,25 +46,12 @@
/* P4/Xeon Thermal transition interrupt handler */
static void intel_thermal_interrupt(struct pt_regs *regs)
{
- u32 l, h;
- unsigned int cpu = smp_processor_id();
- static unsigned long next[NR_CPUS];
+ __u64 msr_val;
ack_APIC_irq();
- if (time_after(next[cpu], jiffies))
- return;
-
- next[cpu] = jiffies + HZ*5;
- rdmsr(MSR_IA32_THERM_STATUS, l, h);
- if (l & 0x1) {
- printk(KERN_EMERG "CPU%d: Temperature above threshold\n", cpu);
- printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n",
- cpu);
- add_taint(TAINT_MACHINE_CHECK);
- } else {
- printk(KERN_INFO "CPU%d: Temperature/speed normal\n", cpu);
- }
+ rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
+ therm_throt_process(msr_val & 0x1);
}
/* Thermal interrupt handler for this CPU setup */
@@ -122,10 +111,13 @@
rdmsr (MSR_IA32_MISC_ENABLE, l, h);
wrmsr (MSR_IA32_MISC_ENABLE, l | (1<<3), h);
-
+
l = apic_read (APIC_LVTTHMR);
apic_write_around (APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
printk (KERN_INFO "CPU%d: Thermal monitoring enabled\n", cpu);
+
+ /* enable thermal throttle processing */
+ atomic_set(&therm_throt_en, 1);
return;
}
#endif /* CONFIG_X86_MCE_P4THERMAL */
diff --git a/arch/i386/kernel/cpu/mcheck/therm_throt.c b/arch/i386/kernel/cpu/mcheck/therm_throt.c
new file mode 100644
index 0000000..4f43047
--- /dev/null
+++ b/arch/i386/kernel/cpu/mcheck/therm_throt.c
@@ -0,0 +1,180 @@
+/*
+ * linux/arch/i386/kerne/cpu/mcheck/therm_throt.c
+ *
+ * Thermal throttle event support code (such as syslog messaging and rate
+ * limiting) that was factored out from x86_64 (mce_intel.c) and i386 (p4.c).
+ * This allows consistent reporting of CPU thermal throttle events.
+ *
+ * Maintains a counter in /sys that keeps track of the number of thermal
+ * events, such that the user knows how bad the thermal problem might be
+ * (since the logging to syslog and mcelog is rate limited).
+ *
+ * Author: Dmitriy Zavin (dmitriyz@google.com)
+ *
+ * Credits: Adapted from Zwane Mwaikambo's original code in mce_intel.c.
+ * Inspired by Ross Biro's and Al Borchers' counter code.
+ */
+
+#include <linux/percpu.h>
+#include <linux/sysdev.h>
+#include <linux/cpu.h>
+#include <asm/cpu.h>
+#include <linux/notifier.h>
+#include <asm/therm_throt.h>
+
+/* How long to wait between reporting thermal events */
+#define CHECK_INTERVAL (300 * HZ)
+
+static DEFINE_PER_CPU(__u64, next_check) = INITIAL_JIFFIES;
+static DEFINE_PER_CPU(unsigned long, thermal_throttle_count);
+atomic_t therm_throt_en = ATOMIC_INIT(0);
+
+#ifdef CONFIG_SYSFS
+#define define_therm_throt_sysdev_one_ro(_name) \
+ static SYSDEV_ATTR(_name, 0444, therm_throt_sysdev_show_##_name, NULL)
+
+#define define_therm_throt_sysdev_show_func(name) \
+static ssize_t therm_throt_sysdev_show_##name(struct sys_device *dev, \
+ char *buf) \
+{ \
+ unsigned int cpu = dev->id; \
+ ssize_t ret; \
+ \
+ preempt_disable(); /* CPU hotplug */ \
+ if (cpu_online(cpu)) \
+ ret = sprintf(buf, "%lu\n", \
+ per_cpu(thermal_throttle_##name, cpu)); \
+ else \
+ ret = 0; \
+ preempt_enable(); \
+ \
+ return ret; \
+}
+
+define_therm_throt_sysdev_show_func(count);
+define_therm_throt_sysdev_one_ro(count);
+
+static struct attribute *thermal_throttle_attrs[] = {
+ &attr_count.attr,
+ NULL
+};
+
+static struct attribute_group thermal_throttle_attr_group = {
+ .attrs = thermal_throttle_attrs,
+ .name = "thermal_throttle"
+};
+#endif /* CONFIG_SYSFS */
+
+/***
+ * therm_throt_process - Process thermal throttling event from interrupt
+ * @curr: Whether the condition is current or not (boolean), since the
+ * thermal interrupt normally gets called both when the thermal
+ * event begins and once the event has ended.
+ *
+ * This function is called by the thermal interrupt after the
+ * IRQ has been acknowledged.
+ *
+ * It will take care of rate limiting and printing messages to the syslog.
+ *
+ * Returns: 0 : Event should NOT be further logged, i.e. still in
+ * "timeout" from previous log message.
+ * 1 : Event should be logged further, and a message has been
+ * printed to the syslog.
+ */
+int therm_throt_process(int curr)
+{
+ unsigned int cpu = smp_processor_id();
+ __u64 tmp_jiffs = get_jiffies_64();
+
+ if (curr)
+ __get_cpu_var(thermal_throttle_count)++;
+
+ if (time_before64(tmp_jiffs, __get_cpu_var(next_check)))
+ return 0;
+
+ __get_cpu_var(next_check) = tmp_jiffs + CHECK_INTERVAL;
+
+ /* if we just entered the thermal event */
+ if (curr) {
+ printk(KERN_CRIT "CPU%d: Temperature above threshold, "
+ "cpu clock throttled (total events = %lu)\n", cpu,
+ __get_cpu_var(thermal_throttle_count));
+
+ add_taint(TAINT_MACHINE_CHECK);
+ } else {
+ printk(KERN_CRIT "CPU%d: Temperature/speed normal\n", cpu);
+ }
+
+ return 1;
+}
+
+#ifdef CONFIG_SYSFS
+/* Add/Remove thermal_throttle interface for CPU device */
+static __cpuinit int thermal_throttle_add_dev(struct sys_device * sys_dev)
+{
+ sysfs_create_group(&sys_dev->kobj, &thermal_throttle_attr_group);
+ return 0;
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static __cpuinit int thermal_throttle_remove_dev(struct sys_device * sys_dev)
+{
+ sysfs_remove_group(&sys_dev->kobj, &thermal_throttle_attr_group);
+ return 0;
+}
+
+/* Mutex protecting device creation against CPU hotplug */
+static DEFINE_MUTEX(therm_cpu_lock);
+
+/* Get notified when a cpu comes on/off. Be hotplug friendly. */
+static __cpuinit int thermal_throttle_cpu_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ unsigned int cpu = (unsigned long)hcpu;
+ struct sys_device *sys_dev;
+
+ sys_dev = get_cpu_sysdev(cpu);
+ mutex_lock(&therm_cpu_lock);
+ switch (action) {
+ case CPU_ONLINE:
+ thermal_throttle_add_dev(sys_dev);
+ break;
+ case CPU_DEAD:
+ thermal_throttle_remove_dev(sys_dev);
+ break;
+ }
+ mutex_unlock(&therm_cpu_lock);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block thermal_throttle_cpu_notifier =
+{
+ .notifier_call = thermal_throttle_cpu_callback,
+};
+#endif /* CONFIG_HOTPLUG_CPU */
+
+static __init int thermal_throttle_init_device(void)
+{
+ unsigned int cpu = 0;
+
+ if (!atomic_read(&therm_throt_en))
+ return 0;
+
+ register_hotcpu_notifier(&thermal_throttle_cpu_notifier);
+
+#ifdef CONFIG_HOTPLUG_CPU
+ mutex_lock(&therm_cpu_lock);
+#endif
+ /* connect live CPUs to sysfs */
+ for_each_online_cpu(cpu)
+ thermal_throttle_add_dev(get_cpu_sysdev(cpu));
+#ifdef CONFIG_HOTPLUG_CPU
+ mutex_unlock(&therm_cpu_lock);
+#endif
+
+ return 0;
+}
+
+device_initcall(thermal_throttle_init_device);
+#endif /* CONFIG_SYSFS */
diff --git a/arch/i386/kernel/cpu/nexgen.c b/arch/i386/kernel/cpu/nexgen.c
index ad87fa5..8bf23cc 100644
--- a/arch/i386/kernel/cpu/nexgen.c
+++ b/arch/i386/kernel/cpu/nexgen.c
@@ -10,7 +10,7 @@
* to have CPUID. (Thanks to Herbert Oppmann)
*/
-static int __init deep_magic_nexgen_probe(void)
+static int __cpuinit deep_magic_nexgen_probe(void)
{
int ret;
@@ -27,21 +27,20 @@
return ret;
}
-static void __init init_nexgen(struct cpuinfo_x86 * c)
+static void __cpuinit init_nexgen(struct cpuinfo_x86 * c)
{
c->x86_cache_size = 256; /* A few had 1 MB... */
}
-static void __init nexgen_identify(struct cpuinfo_x86 * c)
+static void __cpuinit nexgen_identify(struct cpuinfo_x86 * c)
{
/* Detect NexGen with old hypercode */
if ( deep_magic_nexgen_probe() ) {
strcpy(c->x86_vendor_id, "NexGenDriven");
}
- generic_identify(c);
}
-static struct cpu_dev nexgen_cpu_dev __initdata = {
+static struct cpu_dev nexgen_cpu_dev __cpuinitdata = {
.c_vendor = "Nexgen",
.c_ident = { "NexGenDriven" },
.c_models = {
diff --git a/arch/i386/kernel/cpu/proc.c b/arch/i386/kernel/cpu/proc.c
index f54a152..76aac08 100644
--- a/arch/i386/kernel/cpu/proc.c
+++ b/arch/i386/kernel/cpu/proc.c
@@ -46,8 +46,8 @@
/* Intel-defined (#2) */
"pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est",
- "tm2", NULL, "cid", NULL, NULL, "cx16", "xtpr", NULL,
- NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+ "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL,
+ NULL, NULL, "dca", NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* VIA/Cyrix/Centaur-defined */
diff --git a/arch/i386/kernel/cpu/rise.c b/arch/i386/kernel/cpu/rise.c
index d08d5a2..9317f74 100644
--- a/arch/i386/kernel/cpu/rise.c
+++ b/arch/i386/kernel/cpu/rise.c
@@ -5,7 +5,7 @@
#include "cpu.h"
-static void __init init_rise(struct cpuinfo_x86 *c)
+static void __cpuinit init_rise(struct cpuinfo_x86 *c)
{
printk("CPU: Rise iDragon");
if (c->x86_model > 2)
@@ -28,7 +28,7 @@
set_bit(X86_FEATURE_CX8, c->x86_capability);
}
-static struct cpu_dev rise_cpu_dev __initdata = {
+static struct cpu_dev rise_cpu_dev __cpuinitdata = {
.c_vendor = "Rise",
.c_ident = { "RiseRiseRise" },
.c_models = {
diff --git a/arch/i386/kernel/cpu/transmeta.c b/arch/i386/kernel/cpu/transmeta.c
index 7214c9b..4056fb7 100644
--- a/arch/i386/kernel/cpu/transmeta.c
+++ b/arch/i386/kernel/cpu/transmeta.c
@@ -5,7 +5,7 @@
#include <asm/msr.h>
#include "cpu.h"
-static void __init init_transmeta(struct cpuinfo_x86 *c)
+static void __cpuinit init_transmeta(struct cpuinfo_x86 *c)
{
unsigned int cap_mask, uk, max, dummy;
unsigned int cms_rev1, cms_rev2;
@@ -85,10 +85,9 @@
#endif
}
-static void __init transmeta_identify(struct cpuinfo_x86 * c)
+static void __cpuinit transmeta_identify(struct cpuinfo_x86 * c)
{
u32 xlvl;
- generic_identify(c);
/* Transmeta-defined flags: level 0x80860001 */
xlvl = cpuid_eax(0x80860000);
@@ -98,7 +97,7 @@
}
}
-static struct cpu_dev transmeta_cpu_dev __initdata = {
+static struct cpu_dev transmeta_cpu_dev __cpuinitdata = {
.c_vendor = "Transmeta",
.c_ident = { "GenuineTMx86", "TransmetaCPU" },
.c_init = init_transmeta,
diff --git a/arch/i386/kernel/cpu/umc.c b/arch/i386/kernel/cpu/umc.c
index 2cd988f..1bf3f87 100644
--- a/arch/i386/kernel/cpu/umc.c
+++ b/arch/i386/kernel/cpu/umc.c
@@ -5,12 +5,8 @@
/* UMC chips appear to be only either 386 or 486, so no special init takes place.
*/
-static void __init init_umc(struct cpuinfo_x86 * c)
-{
-}
-
-static struct cpu_dev umc_cpu_dev __initdata = {
+static struct cpu_dev umc_cpu_dev __cpuinitdata = {
.c_vendor = "UMC",
.c_ident = { "UMC UMC UMC" },
.c_models = {
@@ -21,7 +17,6 @@
}
},
},
- .c_init = init_umc,
};
int __init umc_init_cpu(void)
diff --git a/arch/i386/kernel/crash.c b/arch/i386/kernel/crash.c
index 5b96f03..67d297d 100644
--- a/arch/i386/kernel/crash.c
+++ b/arch/i386/kernel/crash.c
@@ -22,6 +22,8 @@
#include <asm/nmi.h>
#include <asm/hw_irq.h>
#include <asm/apic.h>
+#include <asm/kdebug.h>
+
#include <mach_ipi.h>
@@ -93,16 +95,25 @@
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static atomic_t waiting_for_crash_ipi;
-static int crash_nmi_callback(struct pt_regs *regs, int cpu)
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
{
+ struct pt_regs *regs;
struct pt_regs fixed_regs;
+ int cpu;
+
+ if (val != DIE_NMI_IPI)
+ return NOTIFY_OK;
+
+ regs = ((struct die_args *)data)->regs;
+ cpu = raw_smp_processor_id();
/* Don't do anything if this handler is invoked on crashing cpu.
* Otherwise, system will completely hang. Crashing cpu can get
* an NMI if system was initially booted with nmi_watchdog parameter.
*/
if (cpu == crashing_cpu)
- return 1;
+ return NOTIFY_STOP;
local_irq_disable();
if (!user_mode_vm(regs)) {
@@ -125,13 +136,18 @@
send_IPI_allbutself(NMI_VECTOR);
}
+static struct notifier_block crash_nmi_nb = {
+ .notifier_call = crash_nmi_callback,
+};
+
static void nmi_shootdown_cpus(void)
{
unsigned long msecs;
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
- set_nmi_callback(crash_nmi_callback);
+ if (register_die_notifier(&crash_nmi_nb))
+ return; /* return what? */
/* Ensure the new callback function is set before sending
* out the NMI
*/
diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 87f9f60..5a63d6f 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -76,8 +76,15 @@
NT_MASK = 0x00004000
VM_MASK = 0x00020000
+/* These are replaces for paravirtualization */
+#define DISABLE_INTERRUPTS cli
+#define ENABLE_INTERRUPTS sti
+#define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit
+#define INTERRUPT_RETURN iret
+#define GET_CR0_INTO_EAX movl %cr0, %eax
+
#ifdef CONFIG_PREEMPT
-#define preempt_stop cli; TRACE_IRQS_OFF
+#define preempt_stop DISABLE_INTERRUPTS; TRACE_IRQS_OFF
#else
#define preempt_stop
#define resume_kernel restore_nocheck
@@ -176,18 +183,21 @@
#define RING0_INT_FRAME \
CFI_STARTPROC simple;\
+ CFI_SIGNAL_FRAME;\
CFI_DEF_CFA esp, 3*4;\
/*CFI_OFFSET cs, -2*4;*/\
CFI_OFFSET eip, -3*4
#define RING0_EC_FRAME \
CFI_STARTPROC simple;\
+ CFI_SIGNAL_FRAME;\
CFI_DEF_CFA esp, 4*4;\
/*CFI_OFFSET cs, -2*4;*/\
CFI_OFFSET eip, -3*4
#define RING0_PTREGS_FRAME \
CFI_STARTPROC simple;\
+ CFI_SIGNAL_FRAME;\
CFI_DEF_CFA esp, OLDESP-EBX;\
/*CFI_OFFSET cs, CS-OLDESP;*/\
CFI_OFFSET eip, EIP-OLDESP;\
@@ -233,10 +243,11 @@
check_userspace:
movl EFLAGS(%esp), %eax # mix EFLAGS and CS
movb CS(%esp), %al
- testl $(VM_MASK | 3), %eax
- jz resume_kernel
+ andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
+ cmpl $USER_RPL, %eax
+ jb resume_kernel # not returning to v8086 or userspace
ENTRY(resume_userspace)
- cli # make sure we don't miss an interrupt
+ DISABLE_INTERRUPTS # make sure we don't miss an interrupt
# setting need_resched or sigpending
# between sampling and the iret
movl TI_flags(%ebp), %ecx
@@ -247,7 +258,7 @@
#ifdef CONFIG_PREEMPT
ENTRY(resume_kernel)
- cli
+ DISABLE_INTERRUPTS
cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
jnz restore_nocheck
need_resched:
@@ -267,6 +278,7 @@
# sysenter call handler stub
ENTRY(sysenter_entry)
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA esp, 0
CFI_REGISTER esp, ebp
movl TSS_sysenter_esp0(%esp),%esp
@@ -275,7 +287,7 @@
* No need to follow this irqs on/off section: the syscall
* disabled irqs and here we enable it straight after entry:
*/
- sti
+ ENABLE_INTERRUPTS
pushl $(__USER_DS)
CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET ss, 0*/
@@ -320,7 +332,7 @@
jae syscall_badsys
call *sys_call_table(,%eax,4)
movl %eax,EAX(%esp)
- cli
+ DISABLE_INTERRUPTS
TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx
@@ -330,8 +342,7 @@
movl OLDESP(%esp), %ecx
xorl %ebp,%ebp
TRACE_IRQS_ON
- sti
- sysexit
+ ENABLE_INTERRUPTS_SYSEXIT
CFI_ENDPROC
@@ -356,7 +367,7 @@
call *sys_call_table(,%eax,4)
movl %eax,EAX(%esp) # store the return value
syscall_exit:
- cli # make sure we don't miss an interrupt
+ DISABLE_INTERRUPTS # make sure we don't miss an interrupt
# setting need_resched or sigpending
# between sampling and the iret
TRACE_IRQS_OFF
@@ -371,8 +382,8 @@
# See comments in process.c:copy_thread() for details.
movb OLDSS(%esp), %ah
movb CS(%esp), %al
- andl $(VM_MASK | (4 << 8) | 3), %eax
- cmpl $((4 << 8) | 3), %eax
+ andl $(VM_MASK | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+ cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
CFI_REMEMBER_STATE
je ldt_ss # returning to user-space with LDT SS
restore_nocheck:
@@ -381,11 +392,11 @@
RESTORE_REGS
addl $4, %esp
CFI_ADJUST_CFA_OFFSET -4
-1: iret
+1: INTERRUPT_RETURN
.section .fixup,"ax"
iret_exc:
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS
pushl $0 # no error code
pushl $do_iret_error
jmp error_code
@@ -409,7 +420,7 @@
* dosemu and wine happy. */
subl $8, %esp # reserve space for switch16 pointer
CFI_ADJUST_CFA_OFFSET 8
- cli
+ DISABLE_INTERRUPTS
TRACE_IRQS_OFF
movl %esp, %eax
/* Set up the 16bit stack frame with switch32 pointer on top,
@@ -419,7 +430,7 @@
TRACE_IRQS_IRET
RESTORE_REGS
lss 20+4(%esp), %esp # switch to 16bit stack
-1: iret
+1: INTERRUPT_RETURN
.section __ex_table,"a"
.align 4
.long 1b,iret_exc
@@ -434,7 +445,7 @@
jz work_notifysig
work_resched:
call schedule
- cli # make sure we don't miss an interrupt
+ DISABLE_INTERRUPTS # make sure we don't miss an interrupt
# setting need_resched or sigpending
# between sampling and the iret
TRACE_IRQS_OFF
@@ -490,7 +501,7 @@
testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
jz work_pending
TRACE_IRQS_ON
- sti # could let do_syscall_trace() call
+ ENABLE_INTERRUPTS # could let do_syscall_trace() call
# schedule() instead
movl %esp, %eax
movl $1, %edx
@@ -591,11 +602,9 @@
/* The include is where all of the SMP etc. interrupts come from */
#include "entry_arch.h"
-ENTRY(divide_error)
- RING0_INT_FRAME
- pushl $0 # no error code
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_divide_error
+KPROBE_ENTRY(page_fault)
+ RING0_EC_FRAME
+ pushl $do_page_fault
CFI_ADJUST_CFA_OFFSET 4
ALIGN
error_code:
@@ -645,6 +654,7 @@
call *%edi
jmp ret_from_exception
CFI_ENDPROC
+KPROBE_END(page_fault)
ENTRY(coprocessor_error)
RING0_INT_FRAME
@@ -669,7 +679,7 @@
pushl $-1 # mark this as an int
CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL
- movl %cr0, %eax
+ GET_CR0_INTO_EAX
testl $0x4, %eax # EM (math emulation bit)
jne device_not_available_emulate
preempt_stop
@@ -702,9 +712,15 @@
jne ok; \
label: \
movl TSS_sysenter_esp0+offset(%esp),%esp; \
+ CFI_DEF_CFA esp, 0; \
+ CFI_UNDEFINED eip; \
pushfl; \
+ CFI_ADJUST_CFA_OFFSET 4; \
pushl $__KERNEL_CS; \
- pushl $sysenter_past_esp
+ CFI_ADJUST_CFA_OFFSET 4; \
+ pushl $sysenter_past_esp; \
+ CFI_ADJUST_CFA_OFFSET 4; \
+ CFI_REL_OFFSET eip, 0
KPROBE_ENTRY(debug)
RING0_INT_FRAME
@@ -720,7 +736,8 @@
call do_debug
jmp ret_from_exception
CFI_ENDPROC
- .previous .text
+KPROBE_END(debug)
+
/*
* NMI is doubly nasty. It can happen _while_ we're handling
* a debug fault, and the debug fault hasn't yet been able to
@@ -729,7 +746,7 @@
* check whether we got an NMI on the debug path where the debug
* fault happened on the sysenter path.
*/
-ENTRY(nmi)
+KPROBE_ENTRY(nmi)
RING0_INT_FRAME
pushl %eax
CFI_ADJUST_CFA_OFFSET 4
@@ -754,6 +771,7 @@
cmpl $sysenter_entry,12(%esp)
je nmi_debug_stack_check
nmi_stack_correct:
+ /* We have a RING0_INT_FRAME here */
pushl %eax
CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL
@@ -764,9 +782,12 @@
CFI_ENDPROC
nmi_stack_fixup:
+ RING0_INT_FRAME
FIX_STACK(12,nmi_stack_correct, 1)
jmp nmi_stack_correct
+
nmi_debug_stack_check:
+ /* We have a RING0_INT_FRAME here */
cmpw $__KERNEL_CS,16(%esp)
jne nmi_stack_correct
cmpl $debug,(%esp)
@@ -777,8 +798,10 @@
jmp nmi_stack_correct
nmi_16bit_stack:
- RING0_INT_FRAME
- /* create the pointer to lss back */
+ /* We have a RING0_INT_FRAME here.
+ *
+ * create the pointer to lss back
+ */
pushl %ss
CFI_ADJUST_CFA_OFFSET 4
pushl %esp
@@ -799,12 +822,13 @@
call do_nmi
RESTORE_REGS
lss 12+4(%esp), %esp # back to 16bit stack
-1: iret
+1: INTERRUPT_RETURN
CFI_ENDPROC
.section __ex_table,"a"
.align 4
.long 1b,iret_exc
.previous
+KPROBE_END(nmi)
KPROBE_ENTRY(int3)
RING0_INT_FRAME
@@ -816,7 +840,7 @@
call do_int3
jmp ret_from_exception
CFI_ENDPROC
- .previous .text
+KPROBE_END(int3)
ENTRY(overflow)
RING0_INT_FRAME
@@ -881,7 +905,7 @@
CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
- .previous .text
+KPROBE_END(general_protection)
ENTRY(alignment_check)
RING0_EC_FRAME
@@ -890,13 +914,14 @@
jmp error_code
CFI_ENDPROC
-KPROBE_ENTRY(page_fault)
- RING0_EC_FRAME
- pushl $do_page_fault
+ENTRY(divide_error)
+ RING0_INT_FRAME
+ pushl $0 # no error code
+ CFI_ADJUST_CFA_OFFSET 4
+ pushl $do_divide_error
CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
- .previous .text
#ifdef CONFIG_X86_MCE
ENTRY(machine_check)
@@ -949,6 +974,19 @@
ENDPROC(arch_unwind_init_running)
#endif
+ENTRY(kernel_thread_helper)
+ pushl $0 # fake return address for unwinder
+ CFI_STARTPROC
+ movl %edx,%eax
+ push %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ call *%ebx
+ push %eax
+ CFI_ADJUST_CFA_OFFSET 4
+ call do_exit
+ CFI_ENDPROC
+ENDPROC(kernel_thread_helper)
+
.section .rodata,"a"
#include "syscall_table.S"
diff --git a/arch/i386/kernel/head.S b/arch/i386/kernel/head.S
index a6b8bd8..be9d883 100644
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -371,8 +371,65 @@
addl $8,%edi
dec %ecx
jne rp_sidt
+
+.macro set_early_handler handler,trapno
+ lea \handler,%edx
+ movl $(__KERNEL_CS << 16),%eax
+ movw %dx,%ax
+ movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
+ lea idt_table,%edi
+ movl %eax,8*\trapno(%edi)
+ movl %edx,8*\trapno+4(%edi)
+.endm
+
+ set_early_handler handler=early_divide_err,trapno=0
+ set_early_handler handler=early_illegal_opcode,trapno=6
+ set_early_handler handler=early_protection_fault,trapno=13
+ set_early_handler handler=early_page_fault,trapno=14
+
ret
+early_divide_err:
+ xor %edx,%edx
+ pushl $0 /* fake errcode */
+ jmp early_fault
+
+early_illegal_opcode:
+ movl $6,%edx
+ pushl $0 /* fake errcode */
+ jmp early_fault
+
+early_protection_fault:
+ movl $13,%edx
+ jmp early_fault
+
+early_page_fault:
+ movl $14,%edx
+ jmp early_fault
+
+early_fault:
+ cld
+#ifdef CONFIG_PRINTK
+ movl $(__KERNEL_DS),%eax
+ movl %eax,%ds
+ movl %eax,%es
+ cmpl $2,early_recursion_flag
+ je hlt_loop
+ incl early_recursion_flag
+ movl %cr2,%eax
+ pushl %eax
+ pushl %edx /* trapno */
+ pushl $fault_msg
+#ifdef CONFIG_EARLY_PRINTK
+ call early_printk
+#else
+ call printk
+#endif
+#endif
+hlt_loop:
+ hlt
+ jmp hlt_loop
+
/* This is the default interrupt "handler" :-) */
ALIGN
ignore_int:
@@ -386,6 +443,9 @@
movl $(__KERNEL_DS),%eax
movl %eax,%ds
movl %eax,%es
+ cmpl $2,early_recursion_flag
+ je hlt_loop
+ incl early_recursion_flag
pushl 16(%esp)
pushl 24(%esp)
pushl 32(%esp)
@@ -431,9 +491,16 @@
ready: .byte 0
+early_recursion_flag:
+ .long 0
+
int_msg:
.asciz "Unknown interrupt or fault at EIP %p %p %p\n"
+fault_msg:
+ .ascii "Int %d: CR2 %p err %p EIP %p CS %p flags %p\n"
+ .asciz "Stack: %p %p %p %p %p %p %p %p\n"
+
/*
* The IDT and GDT 'descriptors' are a strange 48-bit object
* only used by the lidt and lgdt instructions. They are not
diff --git a/arch/i386/kernel/i8259.c b/arch/i386/kernel/i8259.c
index d4756d1..ea5f4e7 100644
--- a/arch/i386/kernel/i8259.c
+++ b/arch/i386/kernel/i8259.c
@@ -45,6 +45,8 @@
#define shutdown_8259A_irq disable_8259A_irq
+static int i8259A_auto_eoi;
+
static void mask_and_ack_8259A(unsigned int);
unsigned int startup_8259A_irq(unsigned int irq)
@@ -253,7 +255,7 @@
static int i8259A_resume(struct sys_device *dev)
{
- init_8259A(0);
+ init_8259A(i8259A_auto_eoi);
restore_ELCR(irq_trigger);
return 0;
}
@@ -301,6 +303,8 @@
{
unsigned long flags;
+ i8259A_auto_eoi = auto_eoi;
+
spin_lock_irqsave(&i8259A_lock, flags);
outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 4fb32c5..fd0df75 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -40,6 +40,7 @@
#include <asm/nmi.h>
#include <mach_apic.h>
+#include <mach_apicdef.h>
#include "io_ports.h"
@@ -65,7 +66,7 @@
*/
int nr_ioapic_registers[MAX_IO_APICS];
-int disable_timer_pin_1 __initdata;
+static int disable_timer_pin_1 __initdata;
/*
* Rough estimation of how many shared IRQs there are, can
@@ -93,6 +94,34 @@
#define vector_to_irq(vector) (vector)
#endif
+
+union entry_union {
+ struct { u32 w1, w2; };
+ struct IO_APIC_route_entry entry;
+};
+
+static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
+{
+ union entry_union eu;
+ unsigned long flags;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+ eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+ return eu.entry;
+}
+
+static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
+{
+ unsigned long flags;
+ union entry_union eu;
+ eu.entry = e;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+ io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
/*
* The common case is 1:1 IRQ<->pin mappings. Sometimes there are
* shared ISA-space IRQs, so we have to support them. We are super
@@ -200,13 +229,9 @@
static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
{
struct IO_APIC_route_entry entry;
- unsigned long flags;
/* Check delivery_mode to be sure we're not clearing an SMI pin */
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int*)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
- *(((int*)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry = ioapic_read_entry(apic, pin);
if (entry.delivery_mode == dest_SMI)
return;
@@ -215,10 +240,7 @@
*/
memset(&entry, 0, sizeof(entry));
entry.mask = 1;
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry) + 0));
- io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry) + 1));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(apic, pin, entry);
}
static void clear_IO_APIC (void)
@@ -1283,9 +1305,8 @@
if (!apic && (irq < 16))
disable_8259A_irq(irq);
}
+ ioapic_write_entry(apic, pin, entry);
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
set_native_irq_info(irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -1301,7 +1322,6 @@
static void __init setup_ExtINT_IRQ0_pin(unsigned int apic, unsigned int pin, int vector)
{
struct IO_APIC_route_entry entry;
- unsigned long flags;
memset(&entry,0,sizeof(entry));
@@ -1331,10 +1351,7 @@
/*
* Add it to the IO-APIC irq-routing table:
*/
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(apic, pin, entry);
enable_8259A_irq(0);
}
@@ -1444,10 +1461,7 @@
for (i = 0; i <= reg_01.bits.entries; i++) {
struct IO_APIC_route_entry entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry)+0) = io_apic_read(apic, 0x10+i*2);
- *(((int *)&entry)+1) = io_apic_read(apic, 0x11+i*2);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry = ioapic_read_entry(apic, i);
printk(KERN_DEBUG " %02x %03X %02X ",
i,
@@ -1666,10 +1680,7 @@
/* See if any of the pins is in ExtINT mode */
for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
struct IO_APIC_route_entry entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
- *(((int *)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry = ioapic_read_entry(apic, pin);
/* If the interrupt line is enabled and in ExtInt mode
@@ -1726,7 +1737,6 @@
*/
if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
- unsigned long flags;
memset(&entry, 0, sizeof(entry));
entry.mask = 0; /* Enabled */
@@ -1743,12 +1753,7 @@
/*
* Add it to the IO-APIC irq-routing table:
*/
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(ioapic_i8259.apic, 0x11+2*ioapic_i8259.pin,
- *(((int *)&entry)+1));
- io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
- *(((int *)&entry)+0));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin, entry);
}
disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}
@@ -2213,17 +2218,13 @@
int apic, pin, i;
struct IO_APIC_route_entry entry0, entry1;
unsigned char save_control, save_freq_select;
- unsigned long flags;
pin = find_isa_irq_pin(8, mp_INT);
apic = find_isa_irq_apic(8, mp_INT);
if (pin == -1)
return;
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry0) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
- *(((int *)&entry0) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry0 = ioapic_read_entry(apic, pin);
clear_IO_APIC_pin(apic, pin);
memset(&entry1, 0, sizeof(entry1));
@@ -2236,10 +2237,7 @@
entry1.trigger = 0;
entry1.vector = 0;
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry1) + 1));
- io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry1) + 0));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(apic, pin, entry1);
save_control = CMOS_READ(RTC_CONTROL);
save_freq_select = CMOS_READ(RTC_FREQ_SELECT);
@@ -2258,10 +2256,7 @@
CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
clear_IO_APIC_pin(apic, pin);
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry0) + 1));
- io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry0) + 0));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(apic, pin, entry0);
}
int timer_uses_ioapic_pin_0;
@@ -2461,17 +2456,12 @@
{
struct IO_APIC_route_entry *entry;
struct sysfs_ioapic_data *data;
- unsigned long flags;
int i;
data = container_of(dev, struct sysfs_ioapic_data, dev);
entry = data->entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ ) {
- *(((int *)entry) + 1) = io_apic_read(dev->id, 0x11 + 2 * i);
- *(((int *)entry) + 0) = io_apic_read(dev->id, 0x10 + 2 * i);
- }
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ for (i = 0; i < nr_ioapic_registers[dev->id]; i ++)
+ entry[i] = ioapic_read_entry(dev->id, i);
return 0;
}
@@ -2493,11 +2483,9 @@
reg_00.bits.ID = mp_ioapics[dev->id].mpc_apicid;
io_apic_write(dev->id, 0, reg_00.raw);
}
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ ) {
- io_apic_write(dev->id, 0x11+2*i, *(((int *)entry)+1));
- io_apic_write(dev->id, 0x10+2*i, *(((int *)entry)+0));
- }
spin_unlock_irqrestore(&ioapic_lock, flags);
+ for (i = 0; i < nr_ioapic_registers[dev->id]; i ++)
+ ioapic_write_entry(dev->id, i, entry[i]);
return 0;
}
@@ -2694,9 +2682,8 @@
if (!ioapic && (irq < 16))
disable_8259A_irq(irq);
+ ioapic_write_entry(ioapic, pin, entry);
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(ioapic, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(ioapic, 0x10+2*pin, *(((int *)&entry)+0));
set_native_irq_info(use_pci_vector() ? entry.vector : irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -2704,3 +2691,25 @@
}
#endif /* CONFIG_ACPI */
+
+static int __init parse_disable_timer_pin_1(char *arg)
+{
+ disable_timer_pin_1 = 1;
+ return 0;
+}
+early_param("disable_timer_pin_1", parse_disable_timer_pin_1);
+
+static int __init parse_enable_timer_pin_1(char *arg)
+{
+ disable_timer_pin_1 = -1;
+ return 0;
+}
+early_param("enable_timer_pin_1", parse_enable_timer_pin_1);
+
+static int __init parse_noapic(char *arg)
+{
+ /* disable IO-APIC */
+ disable_ioapic_setup();
+ return 0;
+}
+early_param("noapic", parse_noapic);
diff --git a/arch/i386/kernel/machine_kexec.c b/arch/i386/kernel/machine_kexec.c
index 6b1ae6b..91966ba 100644
--- a/arch/i386/kernel/machine_kexec.c
+++ b/arch/i386/kernel/machine_kexec.c
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/kexec.h>
#include <linux/delay.h>
+#include <linux/init.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
@@ -20,70 +21,13 @@
#include <asm/system.h>
#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-
-#define L0_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
-#define L1_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
-#define L2_ATTR (_PAGE_PRESENT)
-
-#define LEVEL0_SIZE (1UL << 12UL)
-
-#ifndef CONFIG_X86_PAE
-#define LEVEL1_SIZE (1UL << 22UL)
-static u32 pgtable_level1[1024] PAGE_ALIGNED;
-
-static void identity_map_page(unsigned long address)
-{
- unsigned long level1_index, level2_index;
- u32 *pgtable_level2;
-
- /* Find the current page table */
- pgtable_level2 = __va(read_cr3());
-
- /* Find the indexes of the physical address to identity map */
- level1_index = (address % LEVEL1_SIZE)/LEVEL0_SIZE;
- level2_index = address / LEVEL1_SIZE;
-
- /* Identity map the page table entry */
- pgtable_level1[level1_index] = address | L0_ATTR;
- pgtable_level2[level2_index] = __pa(pgtable_level1) | L1_ATTR;
-
- /* Flush the tlb so the new mapping takes effect.
- * Global tlb entries are not flushed but that is not an issue.
- */
- load_cr3(pgtable_level2);
-}
-
-#else
-#define LEVEL1_SIZE (1UL << 21UL)
-#define LEVEL2_SIZE (1UL << 30UL)
-static u64 pgtable_level1[512] PAGE_ALIGNED;
-static u64 pgtable_level2[512] PAGE_ALIGNED;
-
-static void identity_map_page(unsigned long address)
-{
- unsigned long level1_index, level2_index, level3_index;
- u64 *pgtable_level3;
-
- /* Find the current page table */
- pgtable_level3 = __va(read_cr3());
-
- /* Find the indexes of the physical address to identity map */
- level1_index = (address % LEVEL1_SIZE)/LEVEL0_SIZE;
- level2_index = (address % LEVEL2_SIZE)/LEVEL1_SIZE;
- level3_index = address / LEVEL2_SIZE;
-
- /* Identity map the page table entry */
- pgtable_level1[level1_index] = address | L0_ATTR;
- pgtable_level2[level2_index] = __pa(pgtable_level1) | L1_ATTR;
- set_64bit(&pgtable_level3[level3_index],
- __pa(pgtable_level2) | L2_ATTR);
-
- /* Flush the tlb so the new mapping takes effect.
- * Global tlb entries are not flushed but that is not an issue.
- */
- load_cr3(pgtable_level3);
-}
+static u32 kexec_pgd[1024] PAGE_ALIGNED;
+#ifdef CONFIG_X86_PAE
+static u32 kexec_pmd0[1024] PAGE_ALIGNED;
+static u32 kexec_pmd1[1024] PAGE_ALIGNED;
#endif
+static u32 kexec_pte0[1024] PAGE_ALIGNED;
+static u32 kexec_pte1[1024] PAGE_ALIGNED;
static void set_idt(void *newidt, __u16 limit)
{
@@ -127,16 +71,6 @@
#undef __STR
}
-typedef asmlinkage NORET_TYPE void (*relocate_new_kernel_t)(
- unsigned long indirection_page,
- unsigned long reboot_code_buffer,
- unsigned long start_address,
- unsigned int has_pae) ATTRIB_NORET;
-
-extern const unsigned char relocate_new_kernel[];
-extern void relocate_new_kernel_end(void);
-extern const unsigned int relocate_new_kernel_size;
-
/*
* A architecture hook called to validate the
* proposed image and prepare the control pages
@@ -169,25 +103,29 @@
*/
NORET_TYPE void machine_kexec(struct kimage *image)
{
- unsigned long page_list;
- unsigned long reboot_code_buffer;
-
- relocate_new_kernel_t rnk;
+ unsigned long page_list[PAGES_NR];
+ void *control_page;
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
- /* Compute some offsets */
- reboot_code_buffer = page_to_pfn(image->control_code_page)
- << PAGE_SHIFT;
- page_list = image->head;
+ control_page = page_address(image->control_code_page);
+ memcpy(control_page, relocate_kernel, PAGE_SIZE);
- /* Set up an identity mapping for the reboot_code_buffer */
- identity_map_page(reboot_code_buffer);
-
- /* copy it out */
- memcpy((void *)reboot_code_buffer, relocate_new_kernel,
- relocate_new_kernel_size);
+ page_list[PA_CONTROL_PAGE] = __pa(control_page);
+ page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+ page_list[PA_PGD] = __pa(kexec_pgd);
+ page_list[VA_PGD] = (unsigned long)kexec_pgd;
+#ifdef CONFIG_X86_PAE
+ page_list[PA_PMD_0] = __pa(kexec_pmd0);
+ page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
+ page_list[PA_PMD_1] = __pa(kexec_pmd1);
+ page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
+#endif
+ page_list[PA_PTE_0] = __pa(kexec_pte0);
+ page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
+ page_list[PA_PTE_1] = __pa(kexec_pte1);
+ page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
/* The segment registers are funny things, they have both a
* visible and an invisible part. Whenever the visible part is
@@ -206,6 +144,28 @@
set_idt(phys_to_virt(0),0);
/* now call it */
- rnk = (relocate_new_kernel_t) reboot_code_buffer;
- (*rnk)(page_list, reboot_code_buffer, image->start, cpu_has_pae);
+ relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
+ image->start, cpu_has_pae);
}
+
+/* crashkernel=size@addr specifies the location to reserve for
+ * a crash kernel. By reserving this memory we guarantee
+ * that linux never sets it up as a DMA target.
+ * Useful for holding code to do something appropriate
+ * after a kernel panic.
+ */
+static int __init parse_crashkernel(char *arg)
+{
+ unsigned long size, base;
+ size = memparse(arg, &arg);
+ if (*arg == '@') {
+ base = memparse(arg+1, &arg);
+ /* FIXME: Do I want a sanity check
+ * to validate the memory range?
+ */
+ crashk_res.start = base;
+ crashk_res.end = base + size - 1;
+ }
+ return 0;
+}
+early_param("crashkernel", parse_crashkernel);
diff --git a/arch/i386/kernel/mca.c b/arch/i386/kernel/mca.c
index cd5456f..eb57a85 100644
--- a/arch/i386/kernel/mca.c
+++ b/arch/i386/kernel/mca.c
@@ -42,6 +42,7 @@
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/mca.h>
+#include <linux/kprobes.h>
#include <asm/system.h>
#include <asm/io.h>
#include <linux/proc_fs.h>
@@ -414,7 +415,8 @@
/*--------------------------------------------------------------------*/
-static void mca_handle_nmi_device(struct mca_device *mca_dev, int check_flag)
+static __kprobes void
+mca_handle_nmi_device(struct mca_device *mca_dev, int check_flag)
{
int slot = mca_dev->slot;
@@ -444,7 +446,7 @@
/*--------------------------------------------------------------------*/
-static int mca_handle_nmi_callback(struct device *dev, void *data)
+static int __kprobes mca_handle_nmi_callback(struct device *dev, void *data)
{
struct mca_device *mca_dev = to_mca_device(dev);
unsigned char pos5;
@@ -462,7 +464,7 @@
return 0;
}
-void mca_handle_nmi(void)
+void __kprobes mca_handle_nmi(void)
{
/* First try - scan the various adapters and see if a specific
* adapter was responsible for the error.
diff --git a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
index a70b5fa..442aaf8 100644
--- a/arch/i386/kernel/mpparse.c
+++ b/arch/i386/kernel/mpparse.c
@@ -30,6 +30,7 @@
#include <asm/io_apic.h>
#include <mach_apic.h>
+#include <mach_apicdef.h>
#include <mach_mpparse.h>
#include <bios_ebda.h>
@@ -68,7 +69,7 @@
/* Processor that is doing the boot up */
unsigned int boot_cpu_physical_apicid = -1U;
/* Internal processor count */
-static unsigned int __devinitdata num_processors;
+unsigned int __cpuinitdata num_processors;
/* Bitmask of physically existing CPUs */
physid_mask_t phys_cpu_present_map;
@@ -228,12 +229,14 @@
mpc_oem_bus_info(m, str, translation_table[mpc_record]);
+#if MAX_MP_BUSSES < 256
if (m->mpc_busid >= MAX_MP_BUSSES) {
printk(KERN_WARNING "MP table busid value (%d) for bustype %s "
" is too large, max. supported is %d\n",
m->mpc_busid, str, MAX_MP_BUSSES - 1);
return;
}
+#endif
if (strncmp(str, BUSTYPE_ISA, sizeof(BUSTYPE_ISA)-1) == 0) {
mp_bus_id_to_type[m->mpc_busid] = MP_BUS_ISA;
@@ -293,19 +296,6 @@
m->mpc_irqtype, m->mpc_irqflag & 3,
(m->mpc_irqflag >> 2) &3, m->mpc_srcbusid,
m->mpc_srcbusirq, m->mpc_destapic, m->mpc_destapiclint);
- /*
- * Well it seems all SMP boards in existence
- * use ExtINT/LVT1 == LINT0 and
- * NMI/LVT2 == LINT1 - the following check
- * will show us if this assumptions is false.
- * Until then we do not have to add baggage.
- */
- if ((m->mpc_irqtype == mp_ExtINT) &&
- (m->mpc_destapiclint != 0))
- BUG();
- if ((m->mpc_irqtype == mp_NMI) &&
- (m->mpc_destapiclint != 1))
- BUG();
}
#ifdef CONFIG_X86_NUMAQ
@@ -822,8 +812,7 @@
#ifdef CONFIG_ACPI
-void __init mp_register_lapic_address (
- u64 address)
+void __init mp_register_lapic_address(u64 address)
{
mp_lapic_addr = (unsigned long) address;
@@ -835,13 +824,10 @@
Dprintk("Boot CPU = %d\n", boot_cpu_physical_apicid);
}
-
-void __devinit mp_register_lapic (
- u8 id,
- u8 enabled)
+void __devinit mp_register_lapic (u8 id, u8 enabled)
{
struct mpc_config_processor processor;
- int boot_cpu = 0;
+ int boot_cpu = 0;
if (MAX_APICS - id <= 0) {
printk(KERN_WARNING "Processor #%d invalid (max %d)\n",
@@ -878,11 +864,9 @@
u32 pin_programmed[4];
} mp_ioapic_routing[MAX_IO_APICS];
-
-static int mp_find_ioapic (
- int gsi)
+static int mp_find_ioapic (int gsi)
{
- int i = 0;
+ int i = 0;
/* Find the IOAPIC that manages this GSI. */
for (i = 0; i < nr_ioapics; i++) {
@@ -895,15 +879,11 @@
return -1;
}
-
-void __init mp_register_ioapic (
- u8 id,
- u32 address,
- u32 gsi_base)
+void __init mp_register_ioapic(u8 id, u32 address, u32 gsi_base)
{
- int idx = 0;
- int tmpid;
+ int idx = 0;
+ int tmpid;
if (nr_ioapics >= MAX_IO_APICS) {
printk(KERN_ERR "ERROR: Max # of I/O APICs (%d) exceeded "
@@ -949,16 +929,10 @@
mp_ioapics[idx].mpc_apicver, mp_ioapics[idx].mpc_apicaddr,
mp_ioapic_routing[idx].gsi_base,
mp_ioapic_routing[idx].gsi_end);
-
- return;
}
-
-void __init mp_override_legacy_irq (
- u8 bus_irq,
- u8 polarity,
- u8 trigger,
- u32 gsi)
+void __init
+mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi)
{
struct mpc_config_intsrc intsrc;
int ioapic = -1;
@@ -996,15 +970,13 @@
mp_irqs[mp_irq_entries] = intsrc;
if (++mp_irq_entries == MAX_IRQ_SOURCES)
panic("Max # of irq sources exceeded!\n");
-
- return;
}
void __init mp_config_acpi_legacy_irqs (void)
{
struct mpc_config_intsrc intsrc;
- int i = 0;
- int ioapic = -1;
+ int i = 0;
+ int ioapic = -1;
/*
* Fabricate the legacy ISA bus (bus #31).
@@ -1073,12 +1045,12 @@
#define MAX_GSI_NUM 4096
-int mp_register_gsi (u32 gsi, int triggering, int polarity)
+int mp_register_gsi(u32 gsi, int triggering, int polarity)
{
- int ioapic = -1;
- int ioapic_pin = 0;
- int idx, bit = 0;
- static int pci_irq = 16;
+ int ioapic = -1;
+ int ioapic_pin = 0;
+ int idx, bit = 0;
+ static int pci_irq = 16;
/*
* Mapping between Global System Interrups, which
* represent all possible interrupts, and IRQs
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index acb3514..dbda706 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -21,83 +21,174 @@
#include <linux/sysdev.h>
#include <linux/sysctl.h>
#include <linux/percpu.h>
+#include <linux/dmi.h>
+#include <linux/kprobes.h>
#include <asm/smp.h>
#include <asm/nmi.h>
+#include <asm/kdebug.h>
#include <asm/intel_arch_perfmon.h>
#include "mach_traps.h"
-unsigned int nmi_watchdog = NMI_NONE;
-extern int unknown_nmi_panic;
-static unsigned int nmi_hz = HZ;
-static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */
-static unsigned int nmi_p4_cccr_val;
-extern void show_registers(struct pt_regs *regs);
-
-/*
- * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
- * - it may be reserved by some other driver, or not
- * - when not reserved by some other driver, it may be used for
- * the NMI watchdog, or not
- *
- * This is maintained separately from nmi_active because the NMI
- * watchdog may also be driven from the I/O APIC timer.
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ * different subsystems this reservation system just tries to coordinate
+ * things a little
*/
-static DEFINE_SPINLOCK(lapic_nmi_owner_lock);
-static unsigned int lapic_nmi_owner;
-#define LAPIC_NMI_WATCHDOG (1<<0)
-#define LAPIC_NMI_RESERVED (1<<1)
+static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner);
+static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]);
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
/* nmi_active:
- * +1: the lapic NMI watchdog is active, but can be disabled
- * 0: the lapic NMI watchdog has not been set up, and cannot
+ * >0: the lapic NMI watchdog is active, but can be disabled
+ * <0: the lapic NMI watchdog has not been set up, and cannot
* be enabled
- * -1: the lapic NMI watchdog is disabled, but can be enabled
+ * 0: the lapic NMI watchdog is disabled, but can be enabled
*/
-int nmi_active;
+atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */
-#define K7_EVNTSEL_ENABLE (1 << 22)
-#define K7_EVNTSEL_INT (1 << 20)
-#define K7_EVNTSEL_OS (1 << 17)
-#define K7_EVNTSEL_USR (1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
-#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+unsigned int nmi_watchdog = NMI_DEFAULT;
+static unsigned int nmi_hz = HZ;
-#define P6_EVNTSEL0_ENABLE (1 << 22)
-#define P6_EVNTSEL_INT (1 << 20)
-#define P6_EVNTSEL_OS (1 << 17)
-#define P6_EVNTSEL_USR (1 << 16)
-#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79
-#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED
+struct nmi_watchdog_ctlblk {
+ int enabled;
+ u64 check_bit;
+ unsigned int cccr_msr;
+ unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
+ unsigned int evntsel_msr; /* the MSR to select the events to handle */
+};
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
-#define MSR_P4_MISC_ENABLE 0x1A0
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
-#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL (1<<12)
-#define MSR_P4_PERFCTR0 0x300
-#define MSR_P4_CCCR0 0x360
-#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
-#define P4_ESCR_OS (1<<3)
-#define P4_ESCR_USR (1<<2)
-#define P4_CCCR_OVF_PMI0 (1<<26)
-#define P4_CCCR_OVF_PMI1 (1<<27)
-#define P4_CCCR_THRESHOLD(N) ((N)<<20)
-#define P4_CCCR_COMPLEMENT (1<<19)
-#define P4_CCCR_COMPARE (1<<18)
-#define P4_CCCR_REQUIRED (3<<16)
-#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
-#define P4_CCCR_ENABLE (1<<12)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
- CRU_ESCR0 (with any non-null event selector) through a complemented
- max threshold. [IA32-Vol3, Section 14.9.9] */
-#define MSR_P4_IQ_COUNTER0 0x30C
-#define P4_NMI_CRU_ESCR0 (P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR)
-#define P4_NMI_IQ_CCCR0 \
- (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \
- P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
+/* local prototypes */
+static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
-#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+extern void show_registers(struct pt_regs *regs);
+extern int unknown_nmi_panic;
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+ /* returns the bit offset of the performance counter register */
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ return (msr - MSR_K7_PERFCTR0);
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ return (msr - MSR_ARCH_PERFMON_PERFCTR0);
+
+ switch (boot_cpu_data.x86) {
+ case 6:
+ return (msr - MSR_P6_PERFCTR0);
+ case 15:
+ return (msr - MSR_P4_BPU_PERFCTR0);
+ }
+ }
+ return 0;
+}
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+ /* returns the bit offset of the event selection register */
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ return (msr - MSR_K7_EVNTSEL0);
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
+
+ switch (boot_cpu_data.x86) {
+ case 6:
+ return (msr - MSR_P6_EVNTSEL0);
+ case 15:
+ return (msr - MSR_P4_BSU_ESCR0);
+ }
+ }
+ return 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner)))
+ return 1;
+ return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner));
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner)[0]))
+ return 1;
+ return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner)[0]);
+}
+
+static __cpuinit inline int nmi_known_cpu(void)
+{
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ return 1;
+ else
+ return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
+ }
+ return 0;
+}
#ifdef CONFIG_SMP
/* The performance counters used by NMI_LOCAL_APIC don't trigger when
@@ -125,7 +216,18 @@
unsigned int *prev_nmi_count;
int cpu;
- if (nmi_watchdog == NMI_NONE)
+ /* Enable NMI watchdog for newer systems.
+ Actually it should be safe for most systems before 2004 too except
+ for some IBM systems that corrupt registers when NMI happens
+ during SMM. Unfortunately we don't have more exact information
+ on these and use this coarse check. */
+ if (nmi_watchdog == NMI_DEFAULT && dmi_get_year(DMI_BIOS_DATE) >= 2004)
+ nmi_watchdog = NMI_LOCAL_APIC;
+
+ if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
+ return 0;
+
+ if (!atomic_read(&nmi_active))
return 0;
prev_nmi_count = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
@@ -149,25 +251,45 @@
if (!cpu_isset(cpu, cpu_callin_map))
continue;
#endif
+ if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+ continue;
if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
- endflag = 1;
printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu,
prev_nmi_count[cpu],
nmi_count(cpu));
- nmi_active = 0;
- lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG;
- kfree(prev_nmi_count);
- return -1;
+ per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+ atomic_dec(&nmi_active);
}
}
+ if (!atomic_read(&nmi_active)) {
+ kfree(prev_nmi_count);
+ atomic_set(&nmi_active, -1);
+ return -1;
+ }
endflag = 1;
printk("OK.\n");
/* now that we know it works we can reduce NMI frequency to
something more reasonable; makes a difference in some configs */
- if (nmi_watchdog == NMI_LOCAL_APIC)
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
nmi_hz = 1;
+ /*
+ * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter
+ * are writable, with higher bits sign extending from bit 31.
+ * So, we can only program the counter with 31 bit values and
+ * 32nd bit should be 1, for 33.. to be 1.
+ * Find the appropriate nmi_hz
+ */
+ if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 &&
+ ((u64)cpu_khz * 1000) > 0x7fffffffULL) {
+ u64 count = (u64)cpu_khz * 1000;
+ do_div(count, 0x7fffffffUL);
+ nmi_hz = count + 1;
+ }
+ }
kfree(prev_nmi_count);
return 0;
@@ -181,124 +303,70 @@
get_option(&str, &nmi);
- if (nmi >= NMI_INVALID)
+ if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE))
return 0;
- if (nmi == NMI_NONE)
- nmi_watchdog = nmi;
/*
* If any other x86 CPU has a local APIC, then
* please test the NMI stuff there and send me the
* missing bits. Right now Intel P6/P4 and AMD K7 only.
*/
- if ((nmi == NMI_LOCAL_APIC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
- (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15))
- nmi_watchdog = nmi;
- if ((nmi == NMI_LOCAL_APIC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
- (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15))
- nmi_watchdog = nmi;
- /*
- * We can enable the IO-APIC watchdog
- * unconditionally.
- */
- if (nmi == NMI_IO_APIC) {
- nmi_active = 1;
- nmi_watchdog = nmi;
- }
+ if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0))
+ return 0; /* no lapic support */
+ nmi_watchdog = nmi;
return 1;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);
-static void disable_intel_arch_watchdog(void);
-
static void disable_lapic_nmi_watchdog(void)
{
- if (nmi_active <= 0)
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+ if (atomic_read(&nmi_active) <= 0)
return;
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- wrmsr(MSR_K7_EVNTSEL0, 0, 0);
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- disable_intel_arch_watchdog();
- break;
- }
- switch (boot_cpu_data.x86) {
- case 6:
- if (boot_cpu_data.x86_model > 0xd)
- break;
- wrmsr(MSR_P6_EVNTSEL0, 0, 0);
- break;
- case 15:
- if (boot_cpu_data.x86_model > 0x4)
- break;
+ on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
- wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
- wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
- break;
- }
- break;
- }
- nmi_active = -1;
- /* tell do_nmi() and others that we're not active any more */
- nmi_watchdog = 0;
+ BUG_ON(atomic_read(&nmi_active) != 0);
}
static void enable_lapic_nmi_watchdog(void)
{
- if (nmi_active < 0) {
- nmi_watchdog = NMI_LOCAL_APIC;
- setup_apic_nmi_watchdog();
- }
-}
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-int reserve_lapic_nmi(void)
-{
- unsigned int old_owner;
+ /* are we already enabled */
+ if (atomic_read(&nmi_active) != 0)
+ return;
- spin_lock(&lapic_nmi_owner_lock);
- old_owner = lapic_nmi_owner;
- lapic_nmi_owner |= LAPIC_NMI_RESERVED;
- spin_unlock(&lapic_nmi_owner_lock);
- if (old_owner & LAPIC_NMI_RESERVED)
- return -EBUSY;
- if (old_owner & LAPIC_NMI_WATCHDOG)
- disable_lapic_nmi_watchdog();
- return 0;
-}
+ /* are we lapic aware */
+ if (nmi_known_cpu() <= 0)
+ return;
-void release_lapic_nmi(void)
-{
- unsigned int new_owner;
-
- spin_lock(&lapic_nmi_owner_lock);
- new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
- lapic_nmi_owner = new_owner;
- spin_unlock(&lapic_nmi_owner_lock);
- if (new_owner & LAPIC_NMI_WATCHDOG)
- enable_lapic_nmi_watchdog();
+ on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+ touch_nmi_watchdog();
}
void disable_timer_nmi_watchdog(void)
{
- if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0))
+ BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+ if (atomic_read(&nmi_active) <= 0)
return;
- unset_nmi_callback();
- nmi_active = -1;
- nmi_watchdog = NMI_NONE;
+ disable_irq(0);
+ on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+ BUG_ON(atomic_read(&nmi_active) != 0);
}
void enable_timer_nmi_watchdog(void)
{
- if (nmi_active < 0) {
- nmi_watchdog = NMI_IO_APIC;
+ BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+ if (atomic_read(&nmi_active) == 0) {
touch_nmi_watchdog();
- nmi_active = 1;
+ on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+ enable_irq(0);
}
}
@@ -308,15 +376,20 @@
static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state)
{
- nmi_pm_active = nmi_active;
- disable_lapic_nmi_watchdog();
+ /* only CPU0 goes here, other CPUs should be offline */
+ nmi_pm_active = atomic_read(&nmi_active);
+ stop_apic_nmi_watchdog(NULL);
+ BUG_ON(atomic_read(&nmi_active) != 0);
return 0;
}
static int lapic_nmi_resume(struct sys_device *dev)
{
- if (nmi_pm_active > 0)
- enable_lapic_nmi_watchdog();
+ /* only CPU0 goes here, other CPUs should be offline */
+ if (nmi_pm_active > 0) {
+ setup_apic_nmi_watchdog(NULL);
+ touch_nmi_watchdog();
+ }
return 0;
}
@@ -336,7 +409,13 @@
{
int error;
- if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC)
+ /* should really be a BUG_ON but b/c this is an
+ * init call, it just doesn't work. -dcz
+ */
+ if (nmi_watchdog != NMI_LOCAL_APIC)
+ return 0;
+
+ if ( atomic_read(&nmi_active) < 0 )
return 0;
error = sysdev_class_register(&nmi_sysclass);
@@ -354,138 +433,269 @@
* Original code written by Keith Owens.
*/
-static void clear_msr_range(unsigned int base, unsigned int n)
-{
- unsigned int i;
-
- for(i = 0; i < n; ++i)
- wrmsr(base+i, 0, 0);
-}
-
-static void write_watchdog_counter(const char *descr)
+static void write_watchdog_counter(unsigned int perfctr_msr, const char *descr)
{
u64 count = (u64)cpu_khz * 1000;
do_div(count, nmi_hz);
if(descr)
Dprintk("setting %s to -0x%08Lx\n", descr, count);
- wrmsrl(nmi_perfctr_msr, 0 - count);
+ wrmsrl(perfctr_msr, 0 - count);
}
-static void setup_k7_watchdog(void)
+/* Note that these events don't tick when the CPU idles. This means
+ the frequency varies with CPU load. */
+
+#define K7_EVNTSEL_ENABLE (1 << 22)
+#define K7_EVNTSEL_INT (1 << 20)
+#define K7_EVNTSEL_OS (1 << 17)
+#define K7_EVNTSEL_USR (1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
+#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+
+static int setup_k7_watchdog(void)
{
+ unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- nmi_perfctr_msr = MSR_K7_PERFCTR0;
+ perfctr_msr = MSR_K7_PERFCTR0;
+ evntsel_msr = MSR_K7_EVNTSEL0;
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
- clear_msr_range(MSR_K7_EVNTSEL0, 4);
- clear_msr_range(MSR_K7_PERFCTR0, 4);
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ wrmsrl(perfctr_msr, 0UL);
evntsel = K7_EVNTSEL_INT
| K7_EVNTSEL_OS
| K7_EVNTSEL_USR
| K7_NMI_EVENT;
- wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
- write_watchdog_counter("K7_PERFCTR0");
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ write_watchdog_counter(perfctr_msr, "K7_PERFCTR0");
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= K7_EVNTSEL_ENABLE;
- wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd->check_bit = 1ULL<<63;
+ return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
-static void setup_p6_watchdog(void)
+static void stop_k7_watchdog(void)
{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ wrmsr(wd->evntsel_msr, 0, 0);
+
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
+}
+
+#define P6_EVNTSEL0_ENABLE (1 << 22)
+#define P6_EVNTSEL_INT (1 << 20)
+#define P6_EVNTSEL_OS (1 << 17)
+#define P6_EVNTSEL_USR (1 << 16)
+#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79
+#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED
+
+static int setup_p6_watchdog(void)
+{
+ unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- nmi_perfctr_msr = MSR_P6_PERFCTR0;
+ perfctr_msr = MSR_P6_PERFCTR0;
+ evntsel_msr = MSR_P6_EVNTSEL0;
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
- clear_msr_range(MSR_P6_EVNTSEL0, 2);
- clear_msr_range(MSR_P6_PERFCTR0, 2);
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ wrmsrl(perfctr_msr, 0UL);
evntsel = P6_EVNTSEL_INT
| P6_EVNTSEL_OS
| P6_EVNTSEL_USR
| P6_NMI_EVENT;
- wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
- write_watchdog_counter("P6_PERFCTR0");
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ write_watchdog_counter(perfctr_msr, "P6_PERFCTR0");
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= P6_EVNTSEL0_ENABLE;
- wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd->check_bit = 1ULL<<39;
+ return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
+static void stop_p6_watchdog(void)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ wrmsr(wd->evntsel_msr, 0, 0);
+
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
+}
+
+/* Note that these events don't tick when the CPU idles. This means
+ the frequency varies with CPU load. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
+#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
+#define P4_ESCR_OS (1<<3)
+#define P4_ESCR_USR (1<<2)
+#define P4_CCCR_OVF_PMI0 (1<<26)
+#define P4_CCCR_OVF_PMI1 (1<<27)
+#define P4_CCCR_THRESHOLD(N) ((N)<<20)
+#define P4_CCCR_COMPLEMENT (1<<19)
+#define P4_CCCR_COMPARE (1<<18)
+#define P4_CCCR_REQUIRED (3<<16)
+#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
+#define P4_CCCR_ENABLE (1<<12)
+#define P4_CCCR_OVF (1<<31)
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+ CRU_ESCR0 (with any non-null event selector) through a complemented
+ max threshold. [IA32-Vol3, Section 14.9.9] */
+
static int setup_p4_watchdog(void)
{
+ unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+ unsigned int evntsel, cccr_val;
unsigned int misc_enable, dummy;
+ unsigned int ht_num;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy);
+ rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
return 0;
- nmi_perfctr_msr = MSR_P4_IQ_COUNTER0;
- nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
#ifdef CONFIG_SMP
- if (smp_num_siblings == 2)
- nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
+ /* detect which hyperthread we are on */
+ if (smp_num_siblings == 2) {
+ unsigned int ebx, apicid;
+
+ ebx = cpuid_ebx(1);
+ apicid = (ebx >> 24) & 0xff;
+ ht_num = apicid & 1;
+ } else
#endif
+ ht_num = 0;
- if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL))
- clear_msr_range(0x3F1, 2);
- /* MSR 0x3F0 seems to have a default value of 0xFC00, but current
- docs doesn't fully define it, so leave it alone for now. */
- if (boot_cpu_data.x86_model >= 0x3) {
- /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
- clear_msr_range(0x3A0, 26);
- clear_msr_range(0x3BC, 3);
- } else {
- clear_msr_range(0x3A0, 31);
- }
- clear_msr_range(0x3C0, 6);
- clear_msr_range(0x3C8, 6);
- clear_msr_range(0x3E0, 2);
- clear_msr_range(MSR_P4_CCCR0, 18);
- clear_msr_range(MSR_P4_PERFCTR0, 18);
-
- wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
- wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
- write_watchdog_counter("P4_IQ_COUNTER0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
- return 1;
-}
-
-static void disable_intel_arch_watchdog(void)
-{
- unsigned ebx;
-
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebp indicates event present.
+ /* performance counters are shared resources
+ * assign each hyperthread its own set
+ * (re-use the ESCR0 register, seems safe
+ * and keeps the cccr_val the same)
*/
- ebx = cpuid_ebx(10);
- if (!(ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, 0, 0);
+ if (!ht_num) {
+ /* logical cpu 0 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR0;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR0;
+ cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+ } else {
+ /* logical cpu 1 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR1;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR1;
+ cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+ }
+
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
+
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+ | P4_ESCR_OS
+ | P4_ESCR_USR;
+
+ cccr_val |= P4_CCCR_THRESHOLD(15)
+ | P4_CCCR_COMPLEMENT
+ | P4_CCCR_COMPARE
+ | P4_CCCR_REQUIRED;
+
+ wrmsr(evntsel_msr, evntsel, 0);
+ wrmsr(cccr_msr, cccr_val, 0);
+ write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0");
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ cccr_val |= P4_CCCR_ENABLE;
+ wrmsr(cccr_msr, cccr_val, 0);
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = cccr_msr;
+ wd->check_bit = 1ULL<<39;
+ return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
+static void stop_p4_watchdog(void)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ wrmsr(wd->cccr_msr, 0, 0);
+ wrmsr(wd->evntsel_msr, 0, 0);
+
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
+}
+
+#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
+#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+
static int setup_intel_arch_watchdog(void)
{
+ unsigned int ebx;
+ union cpuid10_eax eax;
+ unsigned int unused;
+ unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
- unsigned ebx;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
/*
* Check whether the Architectural PerfMon supports
* Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebp indicates event present.
+ * NOTE: Corresponding bit = 0 in ebx indicates event present.
*/
- ebx = cpuid_ebx(10);
- if ((ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- return 0;
+ cpuid(10, &(eax.full), &ebx, &unused, &unused);
+ if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+ (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+ goto fail;
- nmi_perfctr_msr = MSR_ARCH_PERFMON_PERFCTR0;
+ perfctr_msr = MSR_ARCH_PERFMON_PERFCTR0;
+ evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL0;
- clear_msr_range(MSR_ARCH_PERFMON_EVENTSEL0, 2);
- clear_msr_range(MSR_ARCH_PERFMON_PERFCTR0, 2);
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
+
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ wrmsrl(perfctr_msr, 0UL);
evntsel = ARCH_PERFMON_EVENTSEL_INT
| ARCH_PERFMON_EVENTSEL_OS
@@ -493,51 +703,145 @@
| ARCH_PERFMON_NMI_EVENT_SEL
| ARCH_PERFMON_NMI_EVENT_UMASK;
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, evntsel, 0);
- write_watchdog_counter("INTEL_ARCH_PERFCTR0");
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ write_watchdog_counter(perfctr_msr, "INTEL_ARCH_PERFCTR0");
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, evntsel, 0);
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd->check_bit = 1ULL << (eax.split.bit_width - 1);
return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
-void setup_apic_nmi_watchdog (void)
+static void stop_intel_arch_watchdog(void)
{
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
- return;
- setup_k7_watchdog();
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- if (!setup_intel_arch_watchdog())
+ unsigned int ebx;
+ union cpuid10_eax eax;
+ unsigned int unused;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /*
+ * Check whether the Architectural PerfMon supports
+ * Unhalted Core Cycles Event or not.
+ * NOTE: Corresponding bit = 0 in ebx indicates event present.
+ */
+ cpuid(10, &(eax.full), &ebx, &unused, &unused);
+ if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+ (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+ return;
+
+ wrmsr(wd->evntsel_msr, 0, 0);
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
+}
+
+void setup_apic_nmi_watchdog (void *unused)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /* only support LOCAL and IO APICs for now */
+ if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+ (nmi_watchdog != NMI_IO_APIC))
+ return;
+
+ if (wd->enabled == 1)
+ return;
+
+ /* cheap hack to support suspend/resume */
+ /* if cpu0 is not active neither should the other cpus */
+ if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
+ return;
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
+ return;
+ if (!setup_k7_watchdog())
return;
break;
- }
- switch (boot_cpu_data.x86) {
- case 6:
- if (boot_cpu_data.x86_model > 0xd)
- return;
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ if (!setup_intel_arch_watchdog())
+ return;
+ break;
+ }
+ switch (boot_cpu_data.x86) {
+ case 6:
+ if (boot_cpu_data.x86_model > 0xd)
+ return;
- setup_p6_watchdog();
- break;
- case 15:
- if (boot_cpu_data.x86_model > 0x4)
- return;
+ if (!setup_p6_watchdog())
+ return;
+ break;
+ case 15:
+ if (boot_cpu_data.x86_model > 0x4)
+ return;
- if (!setup_p4_watchdog())
+ if (!setup_p4_watchdog())
+ return;
+ break;
+ default:
return;
+ }
break;
default:
return;
}
- break;
- default:
- return;
}
- lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
- nmi_active = 1;
+ wd->enabled = 1;
+ atomic_inc(&nmi_active);
+}
+
+void stop_apic_nmi_watchdog(void *unused)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /* only support LOCAL and IO APICs for now */
+ if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+ (nmi_watchdog != NMI_IO_APIC))
+ return;
+
+ if (wd->enabled == 0)
+ return;
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ stop_k7_watchdog();
+ break;
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ stop_intel_arch_watchdog();
+ break;
+ }
+ switch (boot_cpu_data.x86) {
+ case 6:
+ if (boot_cpu_data.x86_model > 0xd)
+ break;
+ stop_p6_watchdog();
+ break;
+ case 15:
+ if (boot_cpu_data.x86_model > 0x4)
+ break;
+ stop_p4_watchdog();
+ break;
+ }
+ break;
+ default:
+ return;
+ }
+ }
+ wd->enabled = 0;
+ atomic_dec(&nmi_active);
}
/*
@@ -579,7 +883,7 @@
extern void die_nmi(struct pt_regs *, const char *msg);
-void nmi_watchdog_tick (struct pt_regs * regs)
+__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
{
/*
@@ -588,11 +892,23 @@
* smp_processor_id().
*/
unsigned int sum;
+ int touched = 0;
int cpu = smp_processor_id();
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+ u64 dummy;
+ int rc=0;
+
+ /* check for other users first */
+ if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+ == NOTIFY_STOP) {
+ rc = 1;
+ touched = 1;
+ }
sum = per_cpu(irq_stat, cpu).apic_timer_irqs;
- if (last_irq_sums[cpu] == sum) {
+ /* if the apic timer isn't firing, this cpu isn't doing much */
+ if (!touched && last_irq_sums[cpu] == sum) {
/*
* Ayiee, looks like this CPU is stuck ...
* wait a few IRQs (5 seconds) before doing the oops ...
@@ -607,27 +923,59 @@
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
- if (nmi_perfctr_msr) {
- if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) {
- /*
- * P4 quirks:
- * - An overflown perfctr will assert its interrupt
- * until the OVF flag in its CCCR is cleared.
- * - LVTPC is masked on interrupt and must be
- * unmasked by the LVTPC handler.
+ /* see if the nmi watchdog went off */
+ if (wd->enabled) {
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ rdmsrl(wd->perfctr_msr, dummy);
+ if (dummy & wd->check_bit){
+ /* this wasn't a watchdog timer interrupt */
+ goto done;
+ }
+
+ /* only Intel P4 uses the cccr msr */
+ if (wd->cccr_msr != 0) {
+ /*
+ * P4 quirks:
+ * - An overflown perfctr will assert its interrupt
+ * until the OVF flag in its CCCR is cleared.
+ * - LVTPC is masked on interrupt and must be
+ * unmasked by the LVTPC handler.
+ */
+ rdmsrl(wd->cccr_msr, dummy);
+ dummy &= ~P4_CCCR_OVF;
+ wrmsrl(wd->cccr_msr, dummy);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ }
+ else if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
+ wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) {
+ /* P6 based Pentium M need to re-unmask
+ * the apic vector but it doesn't hurt
+ * other P6 variant.
+ * ArchPerfom/Core Duo also needs this */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ }
+ /* start the cycle over again */
+ write_watchdog_counter(wd->perfctr_msr, NULL);
+ rc = 1;
+ } else if (nmi_watchdog == NMI_IO_APIC) {
+ /* don't know how to accurately check for this.
+ * just assume it was a watchdog timer interrupt
+ * This matches the old behaviour.
*/
- wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ rc = 1;
}
- else if (nmi_perfctr_msr == MSR_P6_PERFCTR0 ||
- nmi_perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) {
- /* Only P6 based Pentium M need to re-unmask
- * the apic vector but it doesn't hurt
- * other P6 variant */
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- }
- write_watchdog_counter(NULL);
}
+done:
+ return rc;
+}
+
+int do_nmi_callback(struct pt_regs * regs, int cpu)
+{
+#ifdef CONFIG_SYSCTL
+ if (unknown_nmi_panic)
+ return unknown_nmi_panic_callback(regs, cpu);
+#endif
+ return 0;
}
#ifdef CONFIG_SYSCTL
@@ -637,36 +985,46 @@
unsigned char reason = get_nmi_reason();
char buf[64];
- if (!(reason & 0xc0)) {
- sprintf(buf, "NMI received for unknown reason %02x\n", reason);
- die_nmi(regs, buf);
- }
+ sprintf(buf, "NMI received for unknown reason %02x\n", reason);
+ die_nmi(regs, buf);
return 0;
}
/*
- * proc handler for /proc/sys/kernel/unknown_nmi_panic
+ * proc handler for /proc/sys/kernel/nmi
*/
-int proc_unknown_nmi_panic(ctl_table *table, int write, struct file *file,
+int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
void __user *buffer, size_t *length, loff_t *ppos)
{
int old_state;
- old_state = unknown_nmi_panic;
+ nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
+ old_state = nmi_watchdog_enabled;
proc_dointvec(table, write, file, buffer, length, ppos);
- if (!!old_state == !!unknown_nmi_panic)
+ if (!!old_state == !!nmi_watchdog_enabled)
return 0;
- if (unknown_nmi_panic) {
- if (reserve_lapic_nmi() < 0) {
- unknown_nmi_panic = 0;
- return -EBUSY;
- } else {
- set_nmi_callback(unknown_nmi_panic_callback);
- }
+ if (atomic_read(&nmi_active) < 0) {
+ printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
+ return -EIO;
+ }
+
+ if (nmi_watchdog == NMI_DEFAULT) {
+ if (nmi_known_cpu() > 0)
+ nmi_watchdog = NMI_LOCAL_APIC;
+ else
+ nmi_watchdog = NMI_IO_APIC;
+ }
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ if (nmi_watchdog_enabled)
+ enable_lapic_nmi_watchdog();
+ else
+ disable_lapic_nmi_watchdog();
} else {
- release_lapic_nmi();
- unset_nmi_callback();
+ printk( KERN_WARNING
+ "NMI watchdog doesn't know what hardware to touch\n");
+ return -EIO;
}
return 0;
}
@@ -675,7 +1033,11 @@
EXPORT_SYMBOL(nmi_active);
EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(reserve_lapic_nmi);
-EXPORT_SYMBOL(release_lapic_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
EXPORT_SYMBOL(disable_timer_nmi_watchdog);
EXPORT_SYMBOL(enable_timer_nmi_watchdog);
diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index 8657c73..8c190ca 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -37,6 +37,7 @@
#include <linux/kallsyms.h>
#include <linux/ptrace.h>
#include <linux/random.h>
+#include <linux/personality.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -320,15 +321,6 @@
* the "args".
*/
extern void kernel_thread_helper(void);
-__asm__(".section .text\n"
- ".align 4\n"
- "kernel_thread_helper:\n\t"
- "movl %edx,%eax\n\t"
- "pushl %edx\n\t"
- "call *%ebx\n\t"
- "pushl %eax\n\t"
- "call do_exit\n"
- ".previous");
/*
* Create a kernel thread
@@ -346,7 +338,7 @@
regs.xes = __USER_DS;
regs.orig_eax = -1;
regs.eip = (unsigned long) kernel_thread_helper;
- regs.xcs = __KERNEL_CS;
+ regs.xcs = __KERNEL_CS | get_kernel_rpl();
regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
/* Ok, create the new process.. */
@@ -905,7 +897,7 @@
unsigned long arch_align_stack(unsigned long sp)
{
- if (randomize_va_space)
+ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
sp -= get_random_int() % 8192;
return sp & ~0xf;
}
diff --git a/arch/i386/kernel/ptrace.c b/arch/i386/kernel/ptrace.c
index d3db03f..775f50e 100644
--- a/arch/i386/kernel/ptrace.c
+++ b/arch/i386/kernel/ptrace.c
@@ -185,17 +185,17 @@
return addr;
}
-static inline int is_at_popf(struct task_struct *child, struct pt_regs *regs)
+static inline int is_setting_trap_flag(struct task_struct *child, struct pt_regs *regs)
{
int i, copied;
- unsigned char opcode[16];
+ unsigned char opcode[15];
unsigned long addr = convert_eip_to_linear(child, regs);
copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0);
for (i = 0; i < copied; i++) {
switch (opcode[i]) {
- /* popf */
- case 0x9d:
+ /* popf and iret */
+ case 0x9d: case 0xcf:
return 1;
/* opcode and address size prefixes */
case 0x66: case 0x67:
@@ -247,7 +247,7 @@
* don't mark it as being "us" that set it, so that we
* won't clear it by hand later.
*/
- if (is_at_popf(child, regs))
+ if (is_setting_trap_flag(child, regs))
return;
child->ptrace |= PT_DTRACE;
diff --git a/arch/i386/kernel/relocate_kernel.S b/arch/i386/kernel/relocate_kernel.S
index d312616..f151d6f 100644
--- a/arch/i386/kernel/relocate_kernel.S
+++ b/arch/i386/kernel/relocate_kernel.S
@@ -7,16 +7,138 @@
*/
#include <linux/linkage.h>
+#include <asm/page.h>
+#include <asm/kexec.h>
- /*
- * Must be relocatable PIC code callable as a C function, that once
- * it starts can not use the previous processes stack.
- */
- .globl relocate_new_kernel
+/*
+ * Must be relocatable PIC code callable as a C function
+ */
+
+#define PTR(x) (x << 2)
+#define PAGE_ALIGNED (1 << PAGE_SHIFT)
+#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
+#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */
+
+ .text
+ .align PAGE_ALIGNED
+ .globl relocate_kernel
+relocate_kernel:
+ movl 8(%esp), %ebp /* list of pages */
+
+#ifdef CONFIG_X86_PAE
+ /* map the control page at its virtual address */
+
+ movl PTR(VA_PGD)(%ebp), %edi
+ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax
+ andl $0xc0000000, %eax
+ shrl $27, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PMD_0)(%ebp), %edx
+ orl $PAE_PGD_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PMD_0)(%ebp), %edi
+ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x3fe00000, %eax
+ shrl $18, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PTE_0)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PTE_0)(%ebp), %edi
+ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x001ff000, %eax
+ shrl $9, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ /* identity map the control page at its physical address */
+
+ movl PTR(VA_PGD)(%ebp), %edi
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax
+ andl $0xc0000000, %eax
+ shrl $27, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PMD_1)(%ebp), %edx
+ orl $PAE_PGD_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PMD_1)(%ebp), %edi
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x3fe00000, %eax
+ shrl $18, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PTE_1)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PTE_1)(%ebp), %edi
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x001ff000, %eax
+ shrl $9, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+#else
+ /* map the control page at its virtual address */
+
+ movl PTR(VA_PGD)(%ebp), %edi
+ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax
+ andl $0xffc00000, %eax
+ shrl $20, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PTE_0)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PTE_0)(%ebp), %edi
+ movl PTR(VA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x003ff000, %eax
+ shrl $10, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ /* identity map the control page at its physical address */
+
+ movl PTR(VA_PGD)(%ebp), %edi
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax
+ andl $0xffc00000, %eax
+ shrl $20, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_PTE_1)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+
+ movl PTR(VA_PTE_1)(%ebp), %edi
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %eax
+ andl $0x003ff000, %eax
+ shrl $10, %eax
+ addl %edi, %eax
+
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %edx
+ orl $PAGE_ATTR, %edx
+ movl %edx, (%eax)
+#endif
+
relocate_new_kernel:
/* read the arguments and say goodbye to the stack */
movl 4(%esp), %ebx /* page_list */
- movl 8(%esp), %ebp /* reboot_code_buffer */
+ movl 8(%esp), %ebp /* list of pages */
movl 12(%esp), %edx /* start address */
movl 16(%esp), %ecx /* cpu_has_pae */
@@ -24,11 +146,26 @@
pushl $0
popfl
- /* set a new stack at the bottom of our page... */
- lea 4096(%ebp), %esp
+ /* get physical address of control page now */
+ /* this is impossible after page table switch */
+ movl PTR(PA_CONTROL_PAGE)(%ebp), %edi
- /* store the parameters back on the stack */
- pushl %edx /* store the start address */
+ /* switch to new set of page tables */
+ movl PTR(PA_PGD)(%ebp), %eax
+ movl %eax, %cr3
+
+ /* setup a new stack at the end of the physical control page */
+ lea 4096(%edi), %esp
+
+ /* jump to identity mapped page */
+ movl %edi, %eax
+ addl $(identity_mapped - relocate_kernel), %eax
+ pushl %eax
+ ret
+
+identity_mapped:
+ /* store the start address on the stack */
+ pushl %edx
/* Set cr0 to a known state:
* 31 0 == Paging disabled
@@ -113,8 +250,3 @@
xorl %edi, %edi
xorl %ebp, %ebp
ret
-relocate_new_kernel_end:
-
- .globl relocate_new_kernel_size
-relocate_new_kernel_size:
- .long relocate_new_kernel_end - relocate_new_kernel
diff --git a/arch/i386/kernel/semaphore.c b/arch/i386/kernel/semaphore.c
deleted file mode 100644
index 98352c3..0000000
--- a/arch/i386/kernel/semaphore.c
+++ /dev/null
@@ -1,134 +0,0 @@
-/*
- * i386 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-#include <asm/semaphore.h>
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax whish is either a return
- * value or just clobbered..
- */
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __down_failed\n"
-"__down_failed:\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "pushl %ebp\n\t"
- "movl %esp,%ebp\n\t"
-#endif
- "pushl %edx\n\t"
- "pushl %ecx\n\t"
- "call __down\n\t"
- "popl %ecx\n\t"
- "popl %edx\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "movl %ebp,%esp\n\t"
- "popl %ebp\n\t"
-#endif
- "ret"
-);
-
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __down_failed_interruptible\n"
-"__down_failed_interruptible:\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "pushl %ebp\n\t"
- "movl %esp,%ebp\n\t"
-#endif
- "pushl %edx\n\t"
- "pushl %ecx\n\t"
- "call __down_interruptible\n\t"
- "popl %ecx\n\t"
- "popl %edx\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "movl %ebp,%esp\n\t"
- "popl %ebp\n\t"
-#endif
- "ret"
-);
-
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __down_failed_trylock\n"
-"__down_failed_trylock:\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "pushl %ebp\n\t"
- "movl %esp,%ebp\n\t"
-#endif
- "pushl %edx\n\t"
- "pushl %ecx\n\t"
- "call __down_trylock\n\t"
- "popl %ecx\n\t"
- "popl %edx\n\t"
-#if defined(CONFIG_FRAME_POINTER)
- "movl %ebp,%esp\n\t"
- "popl %ebp\n\t"
-#endif
- "ret"
-);
-
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __up_wakeup\n"
-"__up_wakeup:\n\t"
- "pushl %edx\n\t"
- "pushl %ecx\n\t"
- "call __up\n\t"
- "popl %ecx\n\t"
- "popl %edx\n\t"
- "ret"
-);
-
-/*
- * rw spinlock fallbacks
- */
-#if defined(CONFIG_SMP)
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __write_lock_failed\n"
-"__write_lock_failed:\n\t"
- LOCK_PREFIX "addl $" RW_LOCK_BIAS_STR ",(%eax)\n"
-"1: rep; nop\n\t"
- "cmpl $" RW_LOCK_BIAS_STR ",(%eax)\n\t"
- "jne 1b\n\t"
- LOCK_PREFIX "subl $" RW_LOCK_BIAS_STR ",(%eax)\n\t"
- "jnz __write_lock_failed\n\t"
- "ret"
-);
-
-asm(
-".section .sched.text\n"
-".align 4\n"
-".globl __read_lock_failed\n"
-"__read_lock_failed:\n\t"
- LOCK_PREFIX "incl (%eax)\n"
-"1: rep; nop\n\t"
- "cmpl $1,(%eax)\n\t"
- "js 1b\n\t"
- LOCK_PREFIX "decl (%eax)\n\t"
- "js __read_lock_failed\n\t"
- "ret"
-);
-#endif
diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
index 16d9944..76a524b 100644
--- a/arch/i386/kernel/setup.c
+++ b/arch/i386/kernel/setup.c
@@ -90,18 +90,6 @@
unsigned long mmu_cr4_features;
-#ifdef CONFIG_ACPI
- int acpi_disabled = 0;
-#else
- int acpi_disabled = 1;
-#endif
-EXPORT_SYMBOL(acpi_disabled);
-
-#ifdef CONFIG_ACPI
-int __initdata acpi_force = 0;
-extern acpi_interrupt_flags acpi_sci_flags;
-#endif
-
/* for MCA, but anyone else can use it if they want */
unsigned int machine_id;
#ifdef CONFIG_MCA
@@ -149,7 +137,6 @@
struct e820map e820;
extern void early_cpu_init(void);
-extern void generic_apic_probe(char *);
extern int root_mountflags;
unsigned long saved_videomode;
@@ -701,238 +688,132 @@
}
#endif
-static void __init parse_cmdline_early (char ** cmdline_p)
+static int __initdata user_defined_memmap = 0;
+
+/*
+ * "mem=nopentium" disables the 4MB page tables.
+ * "mem=XXX[kKmM]" defines a memory region from HIGH_MEM
+ * to <mem>, overriding the bios size.
+ * "memmap=XXX[KkmM]@XXX[KkmM]" defines a memory region from
+ * <start> to <start>+<mem>, overriding the bios size.
+ *
+ * HPA tells me bootloaders need to parse mem=, so no new
+ * option should be mem= [also see Documentation/i386/boot.txt]
+ */
+static int __init parse_mem(char *arg)
{
- char c = ' ', *to = command_line, *from = saved_command_line;
- int len = 0;
- int userdef = 0;
+ if (!arg)
+ return -EINVAL;
- /* Save unparsed command line copy for /proc/cmdline */
- saved_command_line[COMMAND_LINE_SIZE-1] = '\0';
-
- for (;;) {
- if (c != ' ')
- goto next_char;
- /*
- * "mem=nopentium" disables the 4MB page tables.
- * "mem=XXX[kKmM]" defines a memory region from HIGH_MEM
- * to <mem>, overriding the bios size.
- * "memmap=XXX[KkmM]@XXX[KkmM]" defines a memory region from
- * <start> to <start>+<mem>, overriding the bios size.
- *
- * HPA tells me bootloaders need to parse mem=, so no new
- * option should be mem= [also see Documentation/i386/boot.txt]
+ if (strcmp(arg, "nopentium") == 0) {
+ clear_bit(X86_FEATURE_PSE, boot_cpu_data.x86_capability);
+ disable_pse = 1;
+ } else {
+ /* If the user specifies memory size, we
+ * limit the BIOS-provided memory map to
+ * that size. exactmap can be used to specify
+ * the exact map. mem=number can be used to
+ * trim the existing memory map.
*/
- if (!memcmp(from, "mem=", 4)) {
- if (to != command_line)
- to--;
- if (!memcmp(from+4, "nopentium", 9)) {
- from += 9+4;
- clear_bit(X86_FEATURE_PSE, boot_cpu_data.x86_capability);
- disable_pse = 1;
- } else {
- /* If the user specifies memory size, we
- * limit the BIOS-provided memory map to
- * that size. exactmap can be used to specify
- * the exact map. mem=number can be used to
- * trim the existing memory map.
- */
- unsigned long long mem_size;
+ unsigned long long mem_size;
- mem_size = memparse(from+4, &from);
- limit_regions(mem_size);
- userdef=1;
- }
- }
-
- else if (!memcmp(from, "memmap=", 7)) {
- if (to != command_line)
- to--;
- if (!memcmp(from+7, "exactmap", 8)) {
-#ifdef CONFIG_CRASH_DUMP
- /* If we are doing a crash dump, we
- * still need to know the real mem
- * size before original memory map is
- * reset.
- */
- find_max_pfn();
- saved_max_pfn = max_pfn;
-#endif
- from += 8+7;
- e820.nr_map = 0;
- userdef = 1;
- } else {
- /* If the user specifies memory size, we
- * limit the BIOS-provided memory map to
- * that size. exactmap can be used to specify
- * the exact map. mem=number can be used to
- * trim the existing memory map.
- */
- unsigned long long start_at, mem_size;
-
- mem_size = memparse(from+7, &from);
- if (*from == '@') {
- start_at = memparse(from+1, &from);
- add_memory_region(start_at, mem_size, E820_RAM);
- } else if (*from == '#') {
- start_at = memparse(from+1, &from);
- add_memory_region(start_at, mem_size, E820_ACPI);
- } else if (*from == '$') {
- start_at = memparse(from+1, &from);
- add_memory_region(start_at, mem_size, E820_RESERVED);
- } else {
- limit_regions(mem_size);
- userdef=1;
- }
- }
- }
-
- else if (!memcmp(from, "noexec=", 7))
- noexec_setup(from + 7);
-
-
-#ifdef CONFIG_X86_SMP
- /*
- * If the BIOS enumerates physical processors before logical,
- * maxcpus=N at enumeration-time can be used to disable HT.
- */
- else if (!memcmp(from, "maxcpus=", 8)) {
- extern unsigned int maxcpus;
-
- maxcpus = simple_strtoul(from + 8, NULL, 0);
- }
-#endif
-
-#ifdef CONFIG_ACPI
- /* "acpi=off" disables both ACPI table parsing and interpreter */
- else if (!memcmp(from, "acpi=off", 8)) {
- disable_acpi();
- }
-
- /* acpi=force to over-ride black-list */
- else if (!memcmp(from, "acpi=force", 10)) {
- acpi_force = 1;
- acpi_ht = 1;
- acpi_disabled = 0;
- }
-
- /* acpi=strict disables out-of-spec workarounds */
- else if (!memcmp(from, "acpi=strict", 11)) {
- acpi_strict = 1;
- }
-
- /* Limit ACPI just to boot-time to enable HT */
- else if (!memcmp(from, "acpi=ht", 7)) {
- if (!acpi_force)
- disable_acpi();
- acpi_ht = 1;
- }
-
- /* "pci=noacpi" disable ACPI IRQ routing and PCI scan */
- else if (!memcmp(from, "pci=noacpi", 10)) {
- acpi_disable_pci();
- }
- /* "acpi=noirq" disables ACPI interrupt routing */
- else if (!memcmp(from, "acpi=noirq", 10)) {
- acpi_noirq_set();
- }
-
- else if (!memcmp(from, "acpi_sci=edge", 13))
- acpi_sci_flags.trigger = 1;
-
- else if (!memcmp(from, "acpi_sci=level", 14))
- acpi_sci_flags.trigger = 3;
-
- else if (!memcmp(from, "acpi_sci=high", 13))
- acpi_sci_flags.polarity = 1;
-
- else if (!memcmp(from, "acpi_sci=low", 12))
- acpi_sci_flags.polarity = 3;
-
-#ifdef CONFIG_X86_IO_APIC
- else if (!memcmp(from, "acpi_skip_timer_override", 24))
- acpi_skip_timer_override = 1;
-
- if (!memcmp(from, "disable_timer_pin_1", 19))
- disable_timer_pin_1 = 1;
- if (!memcmp(from, "enable_timer_pin_1", 18))
- disable_timer_pin_1 = -1;
-
- /* disable IO-APIC */
- else if (!memcmp(from, "noapic", 6))
- disable_ioapic_setup();
-#endif /* CONFIG_X86_IO_APIC */
-#endif /* CONFIG_ACPI */
-
-#ifdef CONFIG_X86_LOCAL_APIC
- /* enable local APIC */
- else if (!memcmp(from, "lapic", 5))
- lapic_enable();
-
- /* disable local APIC */
- else if (!memcmp(from, "nolapic", 6))
- lapic_disable();
-#endif /* CONFIG_X86_LOCAL_APIC */
-
-#ifdef CONFIG_KEXEC
- /* crashkernel=size@addr specifies the location to reserve for
- * a crash kernel. By reserving this memory we guarantee
- * that linux never set's it up as a DMA target.
- * Useful for holding code to do something appropriate
- * after a kernel panic.
- */
- else if (!memcmp(from, "crashkernel=", 12)) {
- unsigned long size, base;
- size = memparse(from+12, &from);
- if (*from == '@') {
- base = memparse(from+1, &from);
- /* FIXME: Do I want a sanity check
- * to validate the memory range?
- */
- crashk_res.start = base;
- crashk_res.end = base + size - 1;
- }
- }
-#endif
-#ifdef CONFIG_PROC_VMCORE
- /* elfcorehdr= specifies the location of elf core header
- * stored by the crashed kernel.
- */
- else if (!memcmp(from, "elfcorehdr=", 11))
- elfcorehdr_addr = memparse(from+11, &from);
-#endif
-
- /*
- * highmem=size forces highmem to be exactly 'size' bytes.
- * This works even on boxes that have no highmem otherwise.
- * This also works to reduce highmem size on bigger boxes.
- */
- else if (!memcmp(from, "highmem=", 8))
- highmem_pages = memparse(from+8, &from) >> PAGE_SHIFT;
-
- /*
- * vmalloc=size forces the vmalloc area to be exactly 'size'
- * bytes. This can be used to increase (or decrease) the
- * vmalloc area - the default is 128m.
- */
- else if (!memcmp(from, "vmalloc=", 8))
- __VMALLOC_RESERVE = memparse(from+8, &from);
-
- next_char:
- c = *(from++);
- if (!c)
- break;
- if (COMMAND_LINE_SIZE <= ++len)
- break;
- *(to++) = c;
+ mem_size = memparse(arg, &arg);
+ limit_regions(mem_size);
+ user_defined_memmap = 1;
}
- *to = '\0';
- *cmdline_p = command_line;
- if (userdef) {
- printk(KERN_INFO "user-defined physical RAM map:\n");
- print_memory_map("user");
- }
+ return 0;
}
+early_param("mem", parse_mem);
+
+static int __init parse_memmap(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ if (strcmp(arg, "exactmap") == 0) {
+#ifdef CONFIG_CRASH_DUMP
+ /* If we are doing a crash dump, we
+ * still need to know the real mem
+ * size before original memory map is
+ * reset.
+ */
+ find_max_pfn();
+ saved_max_pfn = max_pfn;
+#endif
+ e820.nr_map = 0;
+ user_defined_memmap = 1;
+ } else {
+ /* If the user specifies memory size, we
+ * limit the BIOS-provided memory map to
+ * that size. exactmap can be used to specify
+ * the exact map. mem=number can be used to
+ * trim the existing memory map.
+ */
+ unsigned long long start_at, mem_size;
+
+ mem_size = memparse(arg, &arg);
+ if (*arg == '@') {
+ start_at = memparse(arg+1, &arg);
+ add_memory_region(start_at, mem_size, E820_RAM);
+ } else if (*arg == '#') {
+ start_at = memparse(arg+1, &arg);
+ add_memory_region(start_at, mem_size, E820_ACPI);
+ } else if (*arg == '$') {
+ start_at = memparse(arg+1, &arg);
+ add_memory_region(start_at, mem_size, E820_RESERVED);
+ } else {
+ limit_regions(mem_size);
+ user_defined_memmap = 1;
+ }
+ }
+ return 0;
+}
+early_param("memmap", parse_memmap);
+
+#ifdef CONFIG_PROC_VMCORE
+/* elfcorehdr= specifies the location of elf core header
+ * stored by the crashed kernel.
+ */
+static int __init parse_elfcorehdr(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ elfcorehdr_addr = memparse(arg, &arg);
+ return 0;
+}
+early_param("elfcorehdr", parse_elfcorehdr);
+#endif /* CONFIG_PROC_VMCORE */
+
+/*
+ * highmem=size forces highmem to be exactly 'size' bytes.
+ * This works even on boxes that have no highmem otherwise.
+ * This also works to reduce highmem size on bigger boxes.
+ */
+static int __init parse_highmem(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ highmem_pages = memparse(arg, &arg) >> PAGE_SHIFT;
+ return 0;
+}
+early_param("highmem", parse_highmem);
+
+/*
+ * vmalloc=size forces the vmalloc area to be exactly 'size'
+ * bytes. This can be used to increase (or decrease) the
+ * vmalloc area - the default is 128m.
+ */
+static int __init parse_vmalloc(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ __VMALLOC_RESERVE = memparse(arg, &arg);
+ return 0;
+}
+early_param("vmalloc", parse_vmalloc);
/*
* reservetop=size reserves a hole at the top of the kernel address space which
@@ -1189,6 +1070,14 @@
}
printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
pages_to_mb(highend_pfn - highstart_pfn));
+ num_physpages = highend_pfn;
+ high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
+#else
+ num_physpages = max_low_pfn;
+ high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
+#endif
+#ifdef CONFIG_FLATMEM
+ max_mapnr = num_physpages;
#endif
printk(KERN_NOTICE "%ldMB LOWMEM available.\n",
pages_to_mb(max_low_pfn));
@@ -1518,17 +1407,15 @@
data_resource.start = virt_to_phys(_etext);
data_resource.end = virt_to_phys(_edata)-1;
- parse_cmdline_early(cmdline_p);
+ parse_early_param();
-#ifdef CONFIG_EARLY_PRINTK
- {
- char *s = strstr(*cmdline_p, "earlyprintk=");
- if (s) {
- setup_early_printk(strchr(s, '=') + 1);
- printk("early console enabled\n");
- }
+ if (user_defined_memmap) {
+ printk(KERN_INFO "user-defined physical RAM map:\n");
+ print_memory_map("user");
}
-#endif
+
+ strlcpy(command_line, saved_command_line, COMMAND_LINE_SIZE);
+ *cmdline_p = command_line;
max_low_pfn = setup_memory();
@@ -1557,7 +1444,7 @@
dmi_scan_machine();
#ifdef CONFIG_X86_GENERICARCH
- generic_apic_probe(*cmdline_p);
+ generic_apic_probe();
#endif
if (efi_enabled)
efi_map_memmap();
@@ -1569,9 +1456,11 @@
acpi_boot_table_init();
#endif
+#ifdef CONFIG_PCI
#ifdef CONFIG_X86_IO_APIC
check_acpi_pci(); /* Checks more than just ACPI actually */
#endif
+#endif
#ifdef CONFIG_ACPI
acpi_boot_init();
diff --git a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c
index efe0799..020d873 100644
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -177,6 +177,9 @@
*/
if ((c->x86_vendor == X86_VENDOR_AMD) && (c->x86 == 6)) {
+ if (num_possible_cpus() == 1)
+ goto valid_k7;
+
/* Athlon 660/661 is valid. */
if ((c->x86_model==6) && ((c->x86_mask==0) || (c->x86_mask==1)))
goto valid_k7;
@@ -1376,7 +1379,8 @@
*/
if (cpu == 0)
return -EBUSY;
-
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ stop_apic_nmi_watchdog(NULL);
clear_local_APIC();
/* Allow any queued timer interrupts to get serviced */
local_irq_enable();
@@ -1490,3 +1494,16 @@
/* IPI for generic function call */
set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
}
+
+/*
+ * If the BIOS enumerates physical processors before logical,
+ * maxcpus=N at enumeration-time can be used to disable HT.
+ */
+static int __init parse_maxcpus(char *arg)
+{
+ extern unsigned int maxcpus;
+
+ maxcpus = simple_strtoul(arg, NULL, 0);
+ return 0;
+}
+early_param("maxcpus", parse_maxcpus);
diff --git a/arch/i386/kernel/stacktrace.c b/arch/i386/kernel/stacktrace.c
deleted file mode 100644
index e62a037..0000000
--- a/arch/i386/kernel/stacktrace.c
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * arch/i386/kernel/stacktrace.c
- *
- * Stack trace management functions
- *
- * Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- */
-#include <linux/sched.h>
-#include <linux/stacktrace.h>
-
-static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
-{
- return p > (void *)tinfo &&
- p < (void *)tinfo + THREAD_SIZE - 3;
-}
-
-/*
- * Save stack-backtrace addresses into a stack_trace buffer:
- */
-static inline unsigned long
-save_context_stack(struct stack_trace *trace, unsigned int skip,
- struct thread_info *tinfo, unsigned long *stack,
- unsigned long ebp)
-{
- unsigned long addr;
-
-#ifdef CONFIG_FRAME_POINTER
- while (valid_stack_ptr(tinfo, (void *)ebp)) {
- addr = *(unsigned long *)(ebp + 4);
- if (!skip)
- trace->entries[trace->nr_entries++] = addr;
- else
- skip--;
- if (trace->nr_entries >= trace->max_entries)
- break;
- /*
- * break out of recursive entries (such as
- * end_of_stack_stop_unwind_function):
- */
- if (ebp == *(unsigned long *)ebp)
- break;
-
- ebp = *(unsigned long *)ebp;
- }
-#else
- while (valid_stack_ptr(tinfo, stack)) {
- addr = *stack++;
- if (__kernel_text_address(addr)) {
- if (!skip)
- trace->entries[trace->nr_entries++] = addr;
- else
- skip--;
- if (trace->nr_entries >= trace->max_entries)
- break;
- }
- }
-#endif
-
- return ebp;
-}
-
-/*
- * Save stack-backtrace addresses into a stack_trace buffer.
- * If all_contexts is set, all contexts (hardirq, softirq and process)
- * are saved. If not set then only the current context is saved.
- */
-void save_stack_trace(struct stack_trace *trace,
- struct task_struct *task, int all_contexts,
- unsigned int skip)
-{
- unsigned long ebp;
- unsigned long *stack = &ebp;
-
- WARN_ON(trace->nr_entries || !trace->max_entries);
-
- if (!task || task == current) {
- /* Grab ebp right from our regs: */
- asm ("movl %%ebp, %0" : "=r" (ebp));
- } else {
- /* ebp is the last reg pushed by switch_to(): */
- ebp = *(unsigned long *) task->thread.esp;
- }
-
- while (1) {
- struct thread_info *context = (struct thread_info *)
- ((unsigned long)stack & (~(THREAD_SIZE - 1)));
-
- ebp = save_context_stack(trace, skip, context, stack, ebp);
- stack = (unsigned long *)context->previous_esp;
- if (!all_contexts || !stack ||
- trace->nr_entries >= trace->max_entries)
- break;
- trace->entries[trace->nr_entries++] = ULONG_MAX;
- if (trace->nr_entries >= trace->max_entries)
- break;
- }
-}
-
diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index dd63d47..7e639f7 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -317,3 +317,4 @@
.long sys_tee /* 315 */
.long sys_vmsplice
.long sys_move_pages
+ .long sys_getcpu
diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
index 1302e4a..86944ac 100644
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -130,18 +130,33 @@
int timer_ack;
-#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
unsigned long profile_pc(struct pt_regs *regs)
{
unsigned long pc = instruction_pointer(regs);
- if (!user_mode_vm(regs) && in_lock_functions(pc))
+#ifdef CONFIG_SMP
+ if (!user_mode_vm(regs) && in_lock_functions(pc)) {
+#ifdef CONFIG_FRAME_POINTER
return *(unsigned long *)(regs->ebp + 4);
-
+#else
+ unsigned long *sp;
+ if ((regs->xcs & 3) == 0)
+ sp = (unsigned long *)®s->esp;
+ else
+ sp = (unsigned long *)regs->esp;
+ /* Return address is either directly at stack pointer
+ or above a saved eflags. Eflags has bits 22-31 zero,
+ kernel addresses don't. */
+ if (sp[0] >> 22)
+ return sp[0];
+ if (sp[1] >> 22)
+ return sp[1];
+#endif
+ }
+#endif
return pc;
}
EXPORT_SYMBOL(profile_pc);
-#endif
/*
* This is the same as the above, except we _also_ save the current
diff --git a/arch/i386/kernel/topology.c b/arch/i386/kernel/topology.c
index e2e281d..07d6da3 100644
--- a/arch/i386/kernel/topology.c
+++ b/arch/i386/kernel/topology.c
@@ -28,6 +28,7 @@
#include <linux/init.h>
#include <linux/smp.h>
#include <linux/nodemask.h>
+#include <linux/mmzone.h>
#include <asm/cpu.h>
static struct i386_cpu cpu_devices[NR_CPUS];
@@ -55,34 +56,18 @@
EXPORT_SYMBOL(arch_unregister_cpu);
#endif /*CONFIG_HOTPLUG_CPU*/
-
+static int __init topology_init(void)
+{
+ int i;
#ifdef CONFIG_NUMA
-#include <linux/mmzone.h>
-
-static int __init topology_init(void)
-{
- int i;
-
for_each_online_node(i)
register_one_node(i);
-
- for_each_present_cpu(i)
- arch_register_cpu(i);
- return 0;
-}
-
-#else /* !CONFIG_NUMA */
-
-static int __init topology_init(void)
-{
- int i;
-
- for_each_present_cpu(i)
- arch_register_cpu(i);
- return 0;
-}
-
#endif /* CONFIG_NUMA */
+ for_each_present_cpu(i)
+ arch_register_cpu(i);
+ return 0;
+}
+
subsys_initcall(topology_init);
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 4fcc6690..21aa1cd 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -51,6 +51,7 @@
#include <asm/smp.h>
#include <asm/arch_hooks.h>
#include <asm/kdebug.h>
+#include <asm/stacktrace.h>
#include <linux/module.h>
@@ -118,26 +119,16 @@
p < (void *)tinfo + THREAD_SIZE - 3;
}
-/*
- * Print one address/symbol entries per line.
- */
-static inline void print_addr_and_symbol(unsigned long addr, char *log_lvl)
-{
- printk(" [<%08lx>] ", addr);
-
- print_symbol("%s\n", addr);
-}
-
static inline unsigned long print_context_stack(struct thread_info *tinfo,
unsigned long *stack, unsigned long ebp,
- char *log_lvl)
+ struct stacktrace_ops *ops, void *data)
{
unsigned long addr;
#ifdef CONFIG_FRAME_POINTER
while (valid_stack_ptr(tinfo, (void *)ebp)) {
addr = *(unsigned long *)(ebp + 4);
- print_addr_and_symbol(addr, log_lvl);
+ ops->address(data, addr);
/*
* break out of recursive entries (such as
* end_of_stack_stop_unwind_function):
@@ -150,30 +141,37 @@
while (valid_stack_ptr(tinfo, stack)) {
addr = *stack++;
if (__kernel_text_address(addr))
- print_addr_and_symbol(addr, log_lvl);
+ ops->address(data, addr);
}
#endif
return ebp;
}
+struct ops_and_data {
+ struct stacktrace_ops *ops;
+ void *data;
+};
+
static asmlinkage int
-show_trace_unwind(struct unwind_frame_info *info, void *log_lvl)
+dump_trace_unwind(struct unwind_frame_info *info, void *data)
{
+ struct ops_and_data *oad = (struct ops_and_data *)data;
int n = 0;
while (unwind(info) == 0 && UNW_PC(info)) {
n++;
- print_addr_and_symbol(UNW_PC(info), log_lvl);
+ oad->ops->address(oad->data, UNW_PC(info));
if (arch_unw_user_mode(info))
break;
}
return n;
}
-static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
- unsigned long *stack, char *log_lvl)
+void dump_trace(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *stack,
+ struct stacktrace_ops *ops, void *data)
{
- unsigned long ebp;
+ unsigned long ebp = 0;
if (!task)
task = current;
@@ -181,54 +179,116 @@
if (call_trace >= 0) {
int unw_ret = 0;
struct unwind_frame_info info;
+ struct ops_and_data oad = { .ops = ops, .data = data };
if (regs) {
if (unwind_init_frame_info(&info, task, regs) == 0)
- unw_ret = show_trace_unwind(&info, log_lvl);
+ unw_ret = dump_trace_unwind(&info, &oad);
} else if (task == current)
- unw_ret = unwind_init_running(&info, show_trace_unwind, log_lvl);
+ unw_ret = unwind_init_running(&info, dump_trace_unwind, &oad);
else {
if (unwind_init_blocked(&info, task) == 0)
- unw_ret = show_trace_unwind(&info, log_lvl);
+ unw_ret = dump_trace_unwind(&info, &oad);
}
if (unw_ret > 0) {
if (call_trace == 1 && !arch_unw_user_mode(&info)) {
- print_symbol("DWARF2 unwinder stuck at %s\n",
+ ops->warning_symbol(data, "DWARF2 unwinder stuck at %s\n",
UNW_PC(&info));
if (UNW_SP(&info) >= PAGE_OFFSET) {
- printk("Leftover inexact backtrace:\n");
+ ops->warning(data, "Leftover inexact backtrace:\n");
stack = (void *)UNW_SP(&info);
+ if (!stack)
+ return;
+ ebp = UNW_FP(&info);
} else
- printk("Full inexact backtrace again:\n");
+ ops->warning(data, "Full inexact backtrace again:\n");
} else if (call_trace >= 1)
return;
else
- printk("Full inexact backtrace again:\n");
+ ops->warning(data, "Full inexact backtrace again:\n");
} else
- printk("Inexact backtrace:\n");
+ ops->warning(data, "Inexact backtrace:\n");
+ }
+ if (!stack) {
+ unsigned long dummy;
+ stack = &dummy;
+ if (task && task != current)
+ stack = (unsigned long *)task->thread.esp;
}
- if (task == current) {
- /* Grab ebp right from our regs */
- asm ("movl %%ebp, %0" : "=r" (ebp) : );
- } else {
- /* ebp is the last reg pushed by switch_to */
- ebp = *(unsigned long *) task->thread.esp;
+#ifdef CONFIG_FRAME_POINTER
+ if (!ebp) {
+ if (task == current) {
+ /* Grab ebp right from our regs */
+ asm ("movl %%ebp, %0" : "=r" (ebp) : );
+ } else {
+ /* ebp is the last reg pushed by switch_to */
+ ebp = *(unsigned long *) task->thread.esp;
+ }
}
+#endif
while (1) {
struct thread_info *context;
context = (struct thread_info *)
((unsigned long)stack & (~(THREAD_SIZE - 1)));
- ebp = print_context_stack(context, stack, ebp, log_lvl);
+ ebp = print_context_stack(context, stack, ebp, ops, data);
+ /* Should be after the line below, but somewhere
+ in early boot context comes out corrupted and we
+ can't reference it -AK */
+ if (ops->stack(data, "IRQ") < 0)
+ break;
stack = (unsigned long*)context->previous_esp;
if (!stack)
break;
- printk("%s =======================\n", log_lvl);
}
}
+EXPORT_SYMBOL(dump_trace);
-void show_trace(struct task_struct *task, struct pt_regs *regs, unsigned long * stack)
+static void
+print_trace_warning_symbol(void *data, char *msg, unsigned long symbol)
+{
+ printk(data);
+ print_symbol(msg, symbol);
+ printk("\n");
+}
+
+static void print_trace_warning(void *data, char *msg)
+{
+ printk("%s%s\n", (char *)data, msg);
+}
+
+static int print_trace_stack(void *data, char *name)
+{
+ return 0;
+}
+
+/*
+ * Print one address/symbol entries per line.
+ */
+static void print_trace_address(void *data, unsigned long addr)
+{
+ printk("%s [<%08lx>] ", (char *)data, addr);
+ print_symbol("%s\n", addr);
+}
+
+static struct stacktrace_ops print_trace_ops = {
+ .warning = print_trace_warning,
+ .warning_symbol = print_trace_warning_symbol,
+ .stack = print_trace_stack,
+ .address = print_trace_address,
+};
+
+static void
+show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
+ unsigned long * stack, char *log_lvl)
+{
+ dump_trace(task, regs, stack, &print_trace_ops, log_lvl);
+ printk("%s =======================\n", log_lvl);
+}
+
+void show_trace(struct task_struct *task, struct pt_regs *regs,
+ unsigned long * stack)
{
show_trace_log_lvl(task, regs, stack, "");
}
@@ -291,8 +351,9 @@
ss = regs->xss & 0xffff;
}
print_modules();
- printk(KERN_EMERG "CPU: %d\nEIP: %04x:[<%08lx>] %s VLI\n"
- "EFLAGS: %08lx (%s %.*s) \n",
+ printk(KERN_EMERG "CPU: %d\n"
+ KERN_EMERG "EIP: %04x:[<%08lx>] %s VLI\n"
+ KERN_EMERG "EFLAGS: %08lx (%s %.*s)\n",
smp_processor_id(), 0xffff & regs->xcs, regs->eip,
print_tainted(), regs->eflags, system_utsname.release,
(int)strcspn(system_utsname.version, " "),
@@ -634,18 +695,24 @@
}
}
-static void mem_parity_error(unsigned char reason, struct pt_regs * regs)
+static __kprobes void
+mem_parity_error(unsigned char reason, struct pt_regs * regs)
{
- printk(KERN_EMERG "Uhhuh. NMI received. Dazed and confused, but trying "
- "to continue\n");
+ printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
+ "CPU %d.\n", reason, smp_processor_id());
printk(KERN_EMERG "You probably have a hardware problem with your RAM "
"chips\n");
+ if (panic_on_unrecovered_nmi)
+ panic("NMI: Not continuing");
+
+ printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
/* Clear and disable the memory parity error line. */
clear_mem_error(reason);
}
-static void io_check_error(unsigned char reason, struct pt_regs * regs)
+static __kprobes void
+io_check_error(unsigned char reason, struct pt_regs * regs)
{
unsigned long i;
@@ -661,7 +728,8 @@
outb(reason, 0x61);
}
-static void unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
+static __kprobes void
+unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
{
#ifdef CONFIG_MCA
/* Might actually be able to figure out what the guilty party
@@ -671,15 +739,18 @@
return;
}
#endif
- printk("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
- reason, smp_processor_id());
- printk("Dazed and confused, but trying to continue\n");
- printk("Do you have a strange power saving mode enabled?\n");
+ printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
+ "CPU %d.\n", reason, smp_processor_id());
+ printk(KERN_EMERG "Do you have a strange power saving mode enabled?\n");
+ if (panic_on_unrecovered_nmi)
+ panic("NMI: Not continuing");
+
+ printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
}
static DEFINE_SPINLOCK(nmi_print_lock);
-void die_nmi (struct pt_regs *regs, const char *msg)
+void __kprobes die_nmi(struct pt_regs *regs, const char *msg)
{
if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 2, SIGINT) ==
NOTIFY_STOP)
@@ -711,7 +782,7 @@
do_exit(SIGSEGV);
}
-static void default_do_nmi(struct pt_regs * regs)
+static __kprobes void default_do_nmi(struct pt_regs * regs)
{
unsigned char reason = 0;
@@ -728,12 +799,12 @@
* Ok, so this is none of the documented NMI sources,
* so it must be the NMI watchdog.
*/
- if (nmi_watchdog) {
- nmi_watchdog_tick(regs);
+ if (nmi_watchdog_tick(regs, reason))
return;
- }
+ if (!do_nmi_callback(regs, smp_processor_id()))
#endif
- unknown_nmi_error(reason, regs);
+ unknown_nmi_error(reason, regs);
+
return;
}
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
@@ -749,14 +820,7 @@
reassert_nmi();
}
-static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
- return 0;
-}
-
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
-
-fastcall void do_nmi(struct pt_regs * regs, long error_code)
+fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
{
int cpu;
@@ -766,25 +830,11 @@
++nmi_count(cpu);
- if (!rcu_dereference(nmi_callback)(regs, cpu))
- default_do_nmi(regs);
+ default_do_nmi(regs);
nmi_exit();
}
-void set_nmi_callback(nmi_callback_t callback)
-{
- vmalloc_sync_all();
- rcu_assign_pointer(nmi_callback, callback);
-}
-EXPORT_SYMBOL_GPL(set_nmi_callback);
-
-void unset_nmi_callback(void)
-{
- nmi_callback = dummy_nmi_callback;
-}
-EXPORT_SYMBOL_GPL(unset_nmi_callback);
-
#ifdef CONFIG_KPROBES
fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
{
@@ -1124,20 +1174,6 @@
}
#endif
-#define _set_gate(gate_addr,type,dpl,addr,seg) \
-do { \
- int __d0, __d1; \
- __asm__ __volatile__ ("movw %%dx,%%ax\n\t" \
- "movw %4,%%dx\n\t" \
- "movl %%eax,%0\n\t" \
- "movl %%edx,%1" \
- :"=m" (*((long *) (gate_addr))), \
- "=m" (*(1+(long *) (gate_addr))), "=&a" (__d0), "=&d" (__d1) \
- :"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
- "3" ((char *) (addr)),"2" ((seg) << 16)); \
-} while (0)
-
-
/*
* This needs to use 'idt_table' rather than 'idt', and
* thus use the _nonmapped_ version of the IDT, as the
@@ -1146,7 +1182,7 @@
*/
void set_intr_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,14,0,addr,__KERNEL_CS);
+ _set_gate(n, DESCTYPE_INT, addr, __KERNEL_CS);
}
/*
@@ -1154,22 +1190,22 @@
*/
static inline void set_system_intr_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n, 14, 3, addr, __KERNEL_CS);
+ _set_gate(n, DESCTYPE_INT | DESCTYPE_DPL3, addr, __KERNEL_CS);
}
static void __init set_trap_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,15,0,addr,__KERNEL_CS);
+ _set_gate(n, DESCTYPE_TRAP, addr, __KERNEL_CS);
}
static void __init set_system_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,15,3,addr,__KERNEL_CS);
+ _set_gate(n, DESCTYPE_TRAP | DESCTYPE_DPL3, addr, __KERNEL_CS);
}
static void __init set_task_gate(unsigned int n, unsigned int gdt_entry)
{
- _set_gate(idt_table+n,5,0,0,(gdt_entry<<3));
+ _set_gate(n, DESCTYPE_TASK, (void *)0, (gdt_entry<<3));
}
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index 7e0d8da..b8fa0a8 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -192,7 +192,7 @@
EXPORT_SYMBOL(recalibrate_cpu_khz);
-void tsc_init(void)
+void __init tsc_init(void)
{
if (!cpu_has_tsc || tsc_disable)
return;
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 914933e..d86a548 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -4,6 +4,6 @@
lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \
- bitops.o
+ bitops.o semaphore.o
lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
diff --git a/arch/i386/lib/semaphore.S b/arch/i386/lib/semaphore.S
new file mode 100644
index 0000000..01f80b5
--- /dev/null
+++ b/arch/i386/lib/semaphore.S
@@ -0,0 +1,217 @@
+/*
+ * i386 semaphore implementation.
+ *
+ * (C) Copyright 1999 Linus Torvalds
+ *
+ * Portions Copyright 1999 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
+ */
+
+#include <linux/config.h>
+#include <linux/linkage.h>
+#include <asm/rwlock.h>
+#include <asm/alternative-asm.i>
+#include <asm/frame.i>
+#include <asm/dwarf2.h>
+
+/*
+ * The semaphore operations have a special calling sequence that
+ * allow us to do a simpler in-line version of them. These routines
+ * need to convert that sequence back into the C sequence when
+ * there is contention on the semaphore.
+ *
+ * %eax contains the semaphore pointer on entry. Save the C-clobbered
+ * registers (%eax, %edx and %ecx) except %eax whish is either a return
+ * value or just clobbered..
+ */
+ .section .sched.text
+ENTRY(__down_failed)
+ CFI_STARTPROC
+ FRAME
+ pushl %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ pushl %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call __down
+ popl %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ecx
+ popl %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE edx
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__down_failed)
+
+ENTRY(__down_failed_interruptible)
+ CFI_STARTPROC
+ FRAME
+ pushl %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ pushl %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call __down_interruptible
+ popl %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ecx
+ popl %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE edx
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__down_failed_interruptible)
+
+ENTRY(__down_failed_trylock)
+ CFI_STARTPROC
+ FRAME
+ pushl %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ pushl %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call __down_trylock
+ popl %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ecx
+ popl %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE edx
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__down_failed_trylock)
+
+ENTRY(__up_wakeup)
+ CFI_STARTPROC
+ FRAME
+ pushl %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ pushl %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call __up
+ popl %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ecx
+ popl %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE edx
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__up_wakeup)
+
+/*
+ * rw spinlock fallbacks
+ */
+#ifdef CONFIG_SMP
+ENTRY(__write_lock_failed)
+ CFI_STARTPROC simple
+ FRAME
+2: LOCK_PREFIX
+ addl $ RW_LOCK_BIAS,(%eax)
+1: rep; nop
+ cmpl $ RW_LOCK_BIAS,(%eax)
+ jne 1b
+ LOCK_PREFIX
+ subl $ RW_LOCK_BIAS,(%eax)
+ jnz 2b
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__write_lock_failed)
+
+ENTRY(__read_lock_failed)
+ CFI_STARTPROC
+ FRAME
+2: LOCK_PREFIX
+ incl (%eax)
+1: rep; nop
+ cmpl $1,(%eax)
+ js 1b
+ LOCK_PREFIX
+ decl (%eax)
+ js 2b
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ END(__read_lock_failed)
+
+#endif
+
+/* Fix up special calling conventions */
+ENTRY(call_rwsem_down_read_failed)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ push %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ call rwsem_down_read_failed
+ pop %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ END(call_rwsem_down_read_failed)
+
+ENTRY(call_rwsem_down_write_failed)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ calll rwsem_down_write_failed
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ END(call_rwsem_down_write_failed)
+
+ENTRY(call_rwsem_wake)
+ CFI_STARTPROC
+ decw %dx /* do nothing if still outstanding active readers */
+ jnz 1f
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call rwsem_wake
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+1: ret
+ CFI_ENDPROC
+ END(call_rwsem_wake)
+
+/* Fix up special calling conventions */
+ENTRY(call_rwsem_downgrade_wake)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ push %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ call rwsem_downgrade_wake
+ pop %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ END(call_rwsem_downgrade_wake)
+
diff --git a/arch/i386/mach-generic/bigsmp.c b/arch/i386/mach-generic/bigsmp.c
index ef7a6e6..33d9f93 100644
--- a/arch/i386/mach-generic/bigsmp.c
+++ b/arch/i386/mach-generic/bigsmp.c
@@ -5,6 +5,7 @@
#define APIC_DEFINITION 1
#include <linux/threads.h>
#include <linux/cpumask.h>
+#include <asm/smp.h>
#include <asm/mpspec.h>
#include <asm/genapic.h>
#include <asm/fixmap.h>
diff --git a/arch/i386/mach-generic/es7000.c b/arch/i386/mach-generic/es7000.c
index 845cdd0..aa144d8 100644
--- a/arch/i386/mach-generic/es7000.c
+++ b/arch/i386/mach-generic/es7000.c
@@ -4,6 +4,7 @@
#define APIC_DEFINITION 1
#include <linux/threads.h>
#include <linux/cpumask.h>
+#include <asm/smp.h>
#include <asm/mpspec.h>
#include <asm/genapic.h>
#include <asm/fixmap.h>
diff --git a/arch/i386/mach-generic/probe.c b/arch/i386/mach-generic/probe.c
index bcd1bcf..94b1fd9 100644
--- a/arch/i386/mach-generic/probe.c
+++ b/arch/i386/mach-generic/probe.c
@@ -9,6 +9,7 @@
#include <linux/kernel.h>
#include <linux/ctype.h>
#include <linux/init.h>
+#include <linux/errno.h>
#include <asm/fixmap.h>
#include <asm/mpspec.h>
#include <asm/apicdef.h>
@@ -29,7 +30,24 @@
NULL,
};
-static int cmdline_apic;
+static int cmdline_apic __initdata;
+static int __init parse_apic(char *arg)
+{
+ int i;
+
+ if (!arg)
+ return -EINVAL;
+
+ for (i = 0; apic_probe[i]; i++) {
+ if (!strcmp(apic_probe[i]->name, arg)) {
+ genapic = apic_probe[i];
+ cmdline_apic = 1;
+ return 0;
+ }
+ }
+ return -ENOENT;
+}
+early_param("apic", parse_apic);
void __init generic_bigsmp_probe(void)
{
@@ -48,40 +66,20 @@
}
}
-void __init generic_apic_probe(char *command_line)
+void __init generic_apic_probe(void)
{
- char *s;
- int i;
- int changed = 0;
-
- s = strstr(command_line, "apic=");
- if (s && (s == command_line || isspace(s[-1]))) {
- char *p = strchr(s, ' '), old;
- if (!p)
- p = strchr(s, '\0');
- old = *p;
- *p = 0;
- for (i = 0; !changed && apic_probe[i]; i++) {
- if (!strcmp(apic_probe[i]->name, s+5)) {
- changed = 1;
+ if (!cmdline_apic) {
+ int i;
+ for (i = 0; apic_probe[i]; i++) {
+ if (apic_probe[i]->probe()) {
genapic = apic_probe[i];
+ break;
}
}
- if (!changed)
- printk(KERN_ERR "Unknown genapic `%s' specified.\n", s);
- *p = old;
- cmdline_apic = changed;
- }
- for (i = 0; !changed && apic_probe[i]; i++) {
- if (apic_probe[i]->probe()) {
- changed = 1;
- genapic = apic_probe[i];
- }
+ /* Not visible without early console */
+ if (!apic_probe[i])
+ panic("Didn't find an APIC driver");
}
- /* Not visible without early console */
- if (!changed)
- panic("Didn't find an APIC driver");
-
printk(KERN_INFO "Using APIC driver %s\n", genapic->name);
}
@@ -119,7 +117,9 @@
return 0;
}
+#ifdef CONFIG_SMP
int hard_smp_processor_id(void)
{
return genapic->get_apic_id(*(unsigned long *)(APIC_BASE+APIC_ID));
}
+#endif
diff --git a/arch/i386/mach-generic/summit.c b/arch/i386/mach-generic/summit.c
index b73501d..f7e5d66 100644
--- a/arch/i386/mach-generic/summit.c
+++ b/arch/i386/mach-generic/summit.c
@@ -4,6 +4,7 @@
#define APIC_DEFINITION 1
#include <linux/threads.h>
#include <linux/cpumask.h>
+#include <asm/smp.h>
#include <asm/mpspec.h>
#include <asm/genapic.h>
#include <asm/fixmap.h>
diff --git a/arch/i386/mm/discontig.c b/arch/i386/mm/discontig.c
index fb5d8b7..941d1a5 100644
--- a/arch/i386/mm/discontig.c
+++ b/arch/i386/mm/discontig.c
@@ -322,6 +322,11 @@
highstart_pfn = system_max_low_pfn;
printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
pages_to_mb(highend_pfn - highstart_pfn));
+ num_physpages = highend_pfn;
+ high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
+#else
+ num_physpages = system_max_low_pfn;
+ high_memory = (void *) __va(system_max_low_pfn * PAGE_SIZE - 1) + 1;
#endif
printk(KERN_NOTICE "%ldMB LOWMEM available.\n",
pages_to_mb(system_max_low_pfn));
diff --git a/arch/i386/mm/extable.c b/arch/i386/mm/extable.c
index de03c54..0ce4f22 100644
--- a/arch/i386/mm/extable.c
+++ b/arch/i386/mm/extable.c
@@ -11,7 +11,7 @@
const struct exception_table_entry *fixup;
#ifdef CONFIG_PNPBIOS
- if (unlikely((regs->xcs & ~15) == (GDT_ENTRY_PNPBIOS_BASE << 3)))
+ if (unlikely(SEGMENT_IS_PNP_CODE(regs->xcs)))
{
extern u32 pnp_bios_fault_eip, pnp_bios_fault_esp;
extern u32 pnp_bios_is_utter_crap;
diff --git a/arch/i386/mm/fault.c b/arch/i386/mm/fault.c
index f727946..5e17a3f 100644
--- a/arch/i386/mm/fault.c
+++ b/arch/i386/mm/fault.c
@@ -27,21 +27,24 @@
#include <asm/uaccess.h>
#include <asm/desc.h>
#include <asm/kdebug.h>
+#include <asm/segment.h>
extern void die(const char *,struct pt_regs *,long);
-#ifdef CONFIG_KPROBES
-ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain);
+static ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain);
+
int register_page_fault_notifier(struct notifier_block *nb)
{
vmalloc_sync_all();
return atomic_notifier_chain_register(¬ify_page_fault_chain, nb);
}
+EXPORT_SYMBOL_GPL(register_page_fault_notifier);
int unregister_page_fault_notifier(struct notifier_block *nb)
{
return atomic_notifier_chain_unregister(¬ify_page_fault_chain, nb);
}
+EXPORT_SYMBOL_GPL(unregister_page_fault_notifier);
static inline int notify_page_fault(enum die_val val, const char *str,
struct pt_regs *regs, long err, int trap, int sig)
@@ -55,14 +58,6 @@
};
return atomic_notifier_call_chain(¬ify_page_fault_chain, val, &args);
}
-#else
-static inline int notify_page_fault(enum die_val val, const char *str,
- struct pt_regs *regs, long err, int trap, int sig)
-{
- return NOTIFY_DONE;
-}
-#endif
-
/*
* Unlock any spinlocks which will prevent us from getting the
@@ -119,10 +114,10 @@
}
/* The standard kernel/user address space limit. */
- *eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+ *eip_limit = user_mode(regs) ? USER_DS.seg : KERNEL_DS.seg;
/* By far the most common cases. */
- if (likely(seg == __USER_CS || seg == __KERNEL_CS))
+ if (likely(SEGMENT_IS_FLAT_CODE(seg)))
return eip;
/* Check the segment exists, is within the current LDT/GDT size,
@@ -436,11 +431,7 @@
write = 0;
switch (error_code & 3) {
default: /* 3: write, present */
-#ifdef TEST_VERIFY_AREA
- if (regs->cs == KERNEL_CS)
- printk("WP fault at %08lx\n", regs->eip);
-#endif
- /* fall through */
+ /* fall through */
case 2: /* write, not present */
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
diff --git a/arch/i386/mm/highmem.c b/arch/i386/mm/highmem.c
index b6eb4dc..ba44000 100644
--- a/arch/i386/mm/highmem.c
+++ b/arch/i386/mm/highmem.c
@@ -54,7 +54,7 @@
unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
- if (vaddr < FIXADDR_START) { // FIXME
+ if (vaddr >= PAGE_OFFSET && vaddr < (unsigned long)high_memory) {
dec_preempt_count();
preempt_check_resched();
return;
diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c
index efd0bcd..4a5a914 100644
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -435,16 +435,22 @@
* on Enable
* off Disable
*/
-void __init noexec_setup(const char *str)
+static int __init noexec_setup(char *str)
{
- if (!strncmp(str, "on",2) && cpu_has_nx) {
- __supported_pte_mask |= _PAGE_NX;
- disable_nx = 0;
- } else if (!strncmp(str,"off",3)) {
+ if (!str || !strcmp(str, "on")) {
+ if (cpu_has_nx) {
+ __supported_pte_mask |= _PAGE_NX;
+ disable_nx = 0;
+ }
+ } else if (!strcmp(str,"off")) {
disable_nx = 1;
__supported_pte_mask &= ~_PAGE_NX;
- }
+ } else
+ return -EINVAL;
+
+ return 0;
}
+early_param("noexec", noexec_setup);
int nx_enabled = 0;
#ifdef CONFIG_X86_PAE
@@ -552,18 +558,6 @@
}
}
-static void __init set_max_mapnr_init(void)
-{
-#ifdef CONFIG_HIGHMEM
- num_physpages = highend_pfn;
-#else
- num_physpages = max_low_pfn;
-#endif
-#ifdef CONFIG_FLATMEM
- max_mapnr = num_physpages;
-#endif
-}
-
static struct kcore_list kcore_mem, kcore_vmalloc;
void __init mem_init(void)
@@ -590,14 +584,6 @@
}
#endif
- set_max_mapnr_init();
-
-#ifdef CONFIG_HIGHMEM
- high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
-#else
- high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
-#endif
-
/* this will put all low memory onto the freelists */
totalram_pages += free_all_bootmem();
diff --git a/arch/i386/oprofile/nmi_int.c b/arch/i386/oprofile/nmi_int.c
index 5f8dc8a..3700eef 100644
--- a/arch/i386/oprofile/nmi_int.c
+++ b/arch/i386/oprofile/nmi_int.c
@@ -17,14 +17,15 @@
#include <asm/nmi.h>
#include <asm/msr.h>
#include <asm/apic.h>
+#include <asm/kdebug.h>
#include "op_counter.h"
#include "op_x86_model.h"
-
+
static struct op_x86_model_spec const * model;
static struct op_msrs cpu_msrs[NR_CPUS];
static unsigned long saved_lvtpc[NR_CPUS];
-
+
static int nmi_start(void);
static void nmi_stop(void);
@@ -82,13 +83,24 @@
#define exit_driverfs() do { } while (0)
#endif /* CONFIG_PM */
-
-static int nmi_callback(struct pt_regs * regs, int cpu)
+static int profile_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
{
- return model->check_ctrs(regs, &cpu_msrs[cpu]);
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+ int cpu = smp_processor_id();
+
+ switch(val) {
+ case DIE_NMI:
+ if (model->check_ctrs(args->regs, &cpu_msrs[cpu]))
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+ return ret;
}
-
-
+
static void nmi_cpu_save_registers(struct op_msrs * msrs)
{
unsigned int const nr_ctrs = model->num_counters;
@@ -98,15 +110,19 @@
unsigned int i;
for (i = 0; i < nr_ctrs; ++i) {
- rdmsr(counters[i].addr,
- counters[i].saved.low,
- counters[i].saved.high);
+ if (counters[i].addr){
+ rdmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
}
for (i = 0; i < nr_ctrls; ++i) {
- rdmsr(controls[i].addr,
- controls[i].saved.low,
- controls[i].saved.high);
+ if (controls[i].addr){
+ rdmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
}
}
@@ -170,27 +186,29 @@
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+static struct notifier_block profile_exceptions_nb = {
+ .notifier_call = profile_exceptions_notify,
+ .next = NULL,
+ .priority = 0
+};
static int nmi_setup(void)
{
+ int err=0;
+
if (!allocate_msrs())
return -ENOMEM;
- /* We walk a thin line between law and rape here.
- * We need to be careful to install our NMI handler
- * without actually triggering any NMIs as this will
- * break the core code horrifically.
- */
- if (reserve_lapic_nmi() < 0) {
+ if ((err = register_die_notifier(&profile_exceptions_nb))){
free_msrs();
- return -EBUSY;
+ return err;
}
+
/* We need to serialize save and setup for HT because the subset
* of msrs are distinct for save and setup operations
*/
on_each_cpu(nmi_save_registers, NULL, 0, 1);
on_each_cpu(nmi_cpu_setup, NULL, 0, 1);
- set_nmi_callback(nmi_callback);
nmi_enabled = 1;
return 0;
}
@@ -205,15 +223,19 @@
unsigned int i;
for (i = 0; i < nr_ctrls; ++i) {
- wrmsr(controls[i].addr,
- controls[i].saved.low,
- controls[i].saved.high);
+ if (controls[i].addr){
+ wrmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
}
for (i = 0; i < nr_ctrs; ++i) {
- wrmsr(counters[i].addr,
- counters[i].saved.low,
- counters[i].saved.high);
+ if (counters[i].addr){
+ wrmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
}
}
@@ -234,6 +256,7 @@
apic_write(APIC_LVTPC, saved_lvtpc[cpu]);
apic_write(APIC_LVTERR, v);
nmi_restore_registers(msrs);
+ model->shutdown(msrs);
}
@@ -241,8 +264,7 @@
{
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
- unset_nmi_callback();
- release_lapic_nmi();
+ unregister_die_notifier(&profile_exceptions_nb);
free_msrs();
}
@@ -284,6 +306,14 @@
struct dentry * dir;
char buf[4];
+ /* quick little hack to _not_ expose a counter if it is not
+ * available for use. This should protect userspace app.
+ * NOTE: assumes 1:1 mapping here (that counters are organized
+ * sequentially in their struct assignment).
+ */
+ if (unlikely(!avail_to_resrv_perfctr_nmi_bit(i)))
+ continue;
+
snprintf(buf, sizeof(buf), "%d", i);
dir = oprofilefs_mkdir(sb, root, buf);
oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled);
diff --git a/arch/i386/oprofile/nmi_timer_int.c b/arch/i386/oprofile/nmi_timer_int.c
index 930a112..abf0ba5 100644
--- a/arch/i386/oprofile/nmi_timer_int.c
+++ b/arch/i386/oprofile/nmi_timer_int.c
@@ -17,34 +17,49 @@
#include <asm/nmi.h>
#include <asm/apic.h>
#include <asm/ptrace.h>
+#include <asm/kdebug.h>
-static int nmi_timer_callback(struct pt_regs * regs, int cpu)
+static int profile_timer_exceptions_notify(struct notifier_block *self,
+ unsigned long val, void *data)
{
- oprofile_add_sample(regs, 0);
- return 1;
+ struct die_args *args = (struct die_args *)data;
+ int ret = NOTIFY_DONE;
+
+ switch(val) {
+ case DIE_NMI:
+ oprofile_add_sample(args->regs, 0);
+ ret = NOTIFY_STOP;
+ break;
+ default:
+ break;
+ }
+ return ret;
}
+static struct notifier_block profile_timer_exceptions_nb = {
+ .notifier_call = profile_timer_exceptions_notify,
+ .next = NULL,
+ .priority = 0
+};
+
static int timer_start(void)
{
- disable_timer_nmi_watchdog();
- set_nmi_callback(nmi_timer_callback);
+ if (register_die_notifier(&profile_timer_exceptions_nb))
+ return 1;
return 0;
}
static void timer_stop(void)
{
- enable_timer_nmi_watchdog();
- unset_nmi_callback();
+ unregister_die_notifier(&profile_timer_exceptions_nb);
synchronize_sched(); /* Allow already-started NMIs to complete. */
}
int __init op_nmi_timer_init(struct oprofile_operations * ops)
{
- extern int nmi_active;
-
- if (nmi_active <= 0)
+ if ((nmi_watchdog != NMI_IO_APIC) || (atomic_read(&nmi_active) <= 0))
return -ENODEV;
ops->start = timer_start;
diff --git a/arch/i386/oprofile/op_model_athlon.c b/arch/i386/oprofile/op_model_athlon.c
index 693bdea..3057a19 100644
--- a/arch/i386/oprofile/op_model_athlon.c
+++ b/arch/i386/oprofile/op_model_athlon.c
@@ -21,10 +21,12 @@
#define NUM_COUNTERS 4
#define NUM_CONTROLS 4
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
#define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
#define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(unsigned int)(l), -1);} while (0)
#define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
#define CTRL_READ(l,h,msrs,c) do {rdmsr(msrs->controls[(c)].addr, (l), (h));} while (0)
#define CTRL_WRITE(l,h,msrs,c) do {wrmsr(msrs->controls[(c)].addr, (l), (h));} while (0)
#define CTRL_SET_ACTIVE(n) (n |= (1<<22))
@@ -40,15 +42,21 @@
static void athlon_fill_in_addresses(struct op_msrs * const msrs)
{
- msrs->counters[0].addr = MSR_K7_PERFCTR0;
- msrs->counters[1].addr = MSR_K7_PERFCTR1;
- msrs->counters[2].addr = MSR_K7_PERFCTR2;
- msrs->counters[3].addr = MSR_K7_PERFCTR3;
+ int i;
- msrs->controls[0].addr = MSR_K7_EVNTSEL0;
- msrs->controls[1].addr = MSR_K7_EVNTSEL1;
- msrs->controls[2].addr = MSR_K7_EVNTSEL2;
- msrs->controls[3].addr = MSR_K7_EVNTSEL3;
+ for (i=0; i < NUM_COUNTERS; i++) {
+ if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i))
+ msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
+ else
+ msrs->counters[i].addr = 0;
+ }
+
+ for (i=0; i < NUM_CONTROLS; i++) {
+ if (reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i))
+ msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
+ else
+ msrs->controls[i].addr = 0;
+ }
}
@@ -59,19 +67,23 @@
/* clear all counters */
for (i = 0 ; i < NUM_CONTROLS; ++i) {
+ if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+ continue;
CTRL_READ(low, high, msrs, i);
CTRL_CLEAR(low);
CTRL_WRITE(low, high, msrs, i);
}
-
+
/* avoid a false detection of ctr overflows in NMI handler */
for (i = 0; i < NUM_COUNTERS; ++i) {
+ if (unlikely(!CTR_IS_RESERVED(msrs,i)))
+ continue;
CTR_WRITE(1, msrs, i);
}
/* enable active counters */
for (i = 0; i < NUM_COUNTERS; ++i) {
- if (counter_config[i].enabled) {
+ if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) {
reset_value[i] = counter_config[i].count;
CTR_WRITE(counter_config[i].count, msrs, i);
@@ -98,6 +110,8 @@
int i;
for (i = 0 ; i < NUM_COUNTERS; ++i) {
+ if (!reset_value[i])
+ continue;
CTR_READ(low, high, msrs, i);
if (CTR_OVERFLOWED(low)) {
oprofile_add_sample(regs, i);
@@ -132,12 +146,27 @@
/* Subtle: stop on all counters to avoid race with
* setting our pm callback */
for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+ if (!reset_value[i])
+ continue;
CTRL_READ(low, high, msrs, i);
CTRL_SET_INACTIVE(low);
CTRL_WRITE(low, high, msrs, i);
}
}
+static void athlon_shutdown(struct op_msrs const * const msrs)
+{
+ int i;
+
+ for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+ if (CTR_IS_RESERVED(msrs,i))
+ release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ }
+ for (i = 0 ; i < NUM_CONTROLS ; ++i) {
+ if (CTRL_IS_RESERVED(msrs,i))
+ release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ }
+}
struct op_x86_model_spec const op_athlon_spec = {
.num_counters = NUM_COUNTERS,
@@ -146,5 +175,6 @@
.setup_ctrs = &athlon_setup_ctrs,
.check_ctrs = &athlon_check_ctrs,
.start = &athlon_start,
- .stop = &athlon_stop
+ .stop = &athlon_stop,
+ .shutdown = &athlon_shutdown
};
diff --git a/arch/i386/oprofile/op_model_p4.c b/arch/i386/oprofile/op_model_p4.c
index 7c61d35..4792592 100644
--- a/arch/i386/oprofile/op_model_p4.c
+++ b/arch/i386/oprofile/op_model_p4.c
@@ -32,7 +32,7 @@
#define NUM_CONTROLS_HT2 (NUM_ESCRS_HT2 + NUM_CCCRS_HT2)
static unsigned int num_counters = NUM_COUNTERS_NON_HT;
-
+static unsigned int num_controls = NUM_CONTROLS_NON_HT;
/* this has to be checked dynamically since the
hyper-threadedness of a chip is discovered at
@@ -40,8 +40,10 @@
static inline void setup_num_counters(void)
{
#ifdef CONFIG_SMP
- if (smp_num_siblings == 2)
+ if (smp_num_siblings == 2){
num_counters = NUM_COUNTERS_HT2;
+ num_controls = NUM_CONTROLS_HT2;
+ }
#endif
}
@@ -97,15 +99,6 @@
#define NUM_UNUSED_CCCRS NUM_CCCRS_NON_HT - NUM_COUNTERS_NON_HT
-/* All cccr we don't use. */
-static int p4_unused_cccr[NUM_UNUSED_CCCRS] = {
- MSR_P4_BPU_CCCR1, MSR_P4_BPU_CCCR3,
- MSR_P4_MS_CCCR1, MSR_P4_MS_CCCR3,
- MSR_P4_FLAME_CCCR1, MSR_P4_FLAME_CCCR3,
- MSR_P4_IQ_CCCR0, MSR_P4_IQ_CCCR1,
- MSR_P4_IQ_CCCR2, MSR_P4_IQ_CCCR3
-};
-
/* p4 event codes in libop/op_event.h are indices into this table. */
static struct p4_event_binding p4_events[NUM_EVENTS] = {
@@ -372,6 +365,8 @@
#define CCCR_OVF_P(cccr) ((cccr) & (1U<<31))
#define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31)))
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
#define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0)
#define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0)
#define CTR_OVERFLOW_P(ctr) (!((ctr) & 0x80000000))
@@ -401,29 +396,34 @@
static void p4_fill_in_addresses(struct op_msrs * const msrs)
{
unsigned int i;
- unsigned int addr, stag;
+ unsigned int addr, cccraddr, stag;
setup_num_counters();
stag = get_stagger();
- /* the counter registers we pay attention to */
+ /* initialize some registers */
for (i = 0; i < num_counters; ++i) {
- msrs->counters[i].addr =
- p4_counters[VIRT_CTR(stag, i)].counter_address;
+ msrs->counters[i].addr = 0;
}
-
- /* FIXME: bad feeling, we don't save the 10 counters we don't use. */
-
- /* 18 CCCR registers */
- for (i = 0, addr = MSR_P4_BPU_CCCR0 + stag;
- addr <= MSR_P4_IQ_CCCR5; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ for (i = 0; i < num_controls; ++i) {
+ msrs->controls[i].addr = 0;
}
+ /* the counter & cccr registers we pay attention to */
+ for (i = 0; i < num_counters; ++i) {
+ addr = p4_counters[VIRT_CTR(stag, i)].counter_address;
+ cccraddr = p4_counters[VIRT_CTR(stag, i)].cccr_address;
+ if (reserve_perfctr_nmi(addr)){
+ msrs->counters[i].addr = addr;
+ msrs->controls[i].addr = cccraddr;
+ }
+ }
+
/* 43 ESCR registers in three or four discontiguous group */
for (addr = MSR_P4_BSU_ESCR0 + stag;
addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
/* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
@@ -431,47 +431,57 @@
if (boot_cpu_data.x86_model >= 0x3) {
for (addr = MSR_P4_BSU_ESCR0 + stag;
addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
} else {
for (addr = MSR_P4_IQ_ESCR0 + stag;
addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
}
for (addr = MSR_P4_RAT_ESCR0 + stag;
addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
for (addr = MSR_P4_MS_ESCR0 + stag;
addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
for (addr = MSR_P4_IX_ESCR0 + stag;
addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) {
- msrs->controls[i].addr = addr;
+ if (reserve_evntsel_nmi(addr))
+ msrs->controls[i].addr = addr;
}
/* there are 2 remaining non-contiguously located ESCRs */
if (num_counters == NUM_COUNTERS_NON_HT) {
/* standard non-HT CPUs handle both remaining ESCRs*/
- msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
- msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+ if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5))
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
} else if (stag == 0) {
/* HT CPUs give the first remainder to the even thread, as
the 32nd control register */
- msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+ if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
} else {
/* and two copies of the second to the odd thread,
for the 22st and 23nd control registers */
- msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
- msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5)) {
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ }
}
}
@@ -544,7 +554,6 @@
{
unsigned int i;
unsigned int low, high;
- unsigned int addr;
unsigned int stag;
stag = get_stagger();
@@ -557,59 +566,24 @@
/* clear the cccrs we will use */
for (i = 0 ; i < num_counters ; i++) {
+ if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+ continue;
rdmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
CCCR_CLEAR(low);
CCCR_SET_REQUIRED_BITS(low);
wrmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
}
- /* clear cccrs outside our concern */
- for (i = stag ; i < NUM_UNUSED_CCCRS ; i += addr_increment()) {
- rdmsr(p4_unused_cccr[i], low, high);
- CCCR_CLEAR(low);
- CCCR_SET_REQUIRED_BITS(low);
- wrmsr(p4_unused_cccr[i], low, high);
- }
-
/* clear all escrs (including those outside our concern) */
- for (addr = MSR_P4_BSU_ESCR0 + stag;
- addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) {
- wrmsr(addr, 0, 0);
+ for (i = num_counters; i < num_controls; i++) {
+ if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+ continue;
+ wrmsr(msrs->controls[i].addr, 0, 0);
}
- /* On older models clear also MSR_P4_IQ_ESCR0/1 */
- if (boot_cpu_data.x86_model < 0x3) {
- wrmsr(MSR_P4_IQ_ESCR0, 0, 0);
- wrmsr(MSR_P4_IQ_ESCR1, 0, 0);
- }
-
- for (addr = MSR_P4_RAT_ESCR0 + stag;
- addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
- wrmsr(addr, 0, 0);
- }
-
- for (addr = MSR_P4_MS_ESCR0 + stag;
- addr <= MSR_P4_TC_ESCR1; addr += addr_increment()){
- wrmsr(addr, 0, 0);
- }
-
- for (addr = MSR_P4_IX_ESCR0 + stag;
- addr <= MSR_P4_CRU_ESCR3; addr += addr_increment()){
- wrmsr(addr, 0, 0);
- }
-
- if (num_counters == NUM_COUNTERS_NON_HT) {
- wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
- wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
- } else if (stag == 0) {
- wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
- } else {
- wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
- }
-
/* setup all counters */
for (i = 0 ; i < num_counters ; ++i) {
- if (counter_config[i].enabled) {
+ if ((counter_config[i].enabled) && (CTRL_IS_RESERVED(msrs,i))) {
reset_value[i] = counter_config[i].count;
pmc_setup_one_p4_counter(i);
CTR_WRITE(counter_config[i].count, VIRT_CTR(stag, i));
@@ -696,12 +670,32 @@
stag = get_stagger();
for (i = 0; i < num_counters; ++i) {
+ if (!reset_value[i])
+ continue;
CCCR_READ(low, high, VIRT_CTR(stag, i));
CCCR_SET_DISABLE(low);
CCCR_WRITE(low, high, VIRT_CTR(stag, i));
}
}
+static void p4_shutdown(struct op_msrs const * const msrs)
+{
+ int i;
+
+ for (i = 0 ; i < num_counters ; ++i) {
+ if (CTR_IS_RESERVED(msrs,i))
+ release_perfctr_nmi(msrs->counters[i].addr);
+ }
+ /* some of the control registers are specially reserved in
+ * conjunction with the counter registers (hence the starting offset).
+ * This saves a few bits.
+ */
+ for (i = num_counters ; i < num_controls ; ++i) {
+ if (CTRL_IS_RESERVED(msrs,i))
+ release_evntsel_nmi(msrs->controls[i].addr);
+ }
+}
+
#ifdef CONFIG_SMP
struct op_x86_model_spec const op_p4_ht2_spec = {
@@ -711,7 +705,8 @@
.setup_ctrs = &p4_setup_ctrs,
.check_ctrs = &p4_check_ctrs,
.start = &p4_start,
- .stop = &p4_stop
+ .stop = &p4_stop,
+ .shutdown = &p4_shutdown
};
#endif
@@ -722,5 +717,6 @@
.setup_ctrs = &p4_setup_ctrs,
.check_ctrs = &p4_check_ctrs,
.start = &p4_start,
- .stop = &p4_stop
+ .stop = &p4_stop,
+ .shutdown = &p4_shutdown
};
diff --git a/arch/i386/oprofile/op_model_ppro.c b/arch/i386/oprofile/op_model_ppro.c
index 5c3ab4b..f88e05b 100644
--- a/arch/i386/oprofile/op_model_ppro.c
+++ b/arch/i386/oprofile/op_model_ppro.c
@@ -22,10 +22,12 @@
#define NUM_COUNTERS 2
#define NUM_CONTROLS 2
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
#define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
#define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0)
#define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
#define CTRL_READ(l,h,msrs,c) do {rdmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
#define CTRL_WRITE(l,h,msrs,c) do {wrmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
#define CTRL_SET_ACTIVE(n) (n |= (1<<22))
@@ -41,11 +43,21 @@
static void ppro_fill_in_addresses(struct op_msrs * const msrs)
{
- msrs->counters[0].addr = MSR_P6_PERFCTR0;
- msrs->counters[1].addr = MSR_P6_PERFCTR1;
+ int i;
+
+ for (i=0; i < NUM_COUNTERS; i++) {
+ if (reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i))
+ msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
+ else
+ msrs->counters[i].addr = 0;
+ }
- msrs->controls[0].addr = MSR_P6_EVNTSEL0;
- msrs->controls[1].addr = MSR_P6_EVNTSEL1;
+ for (i=0; i < NUM_CONTROLS; i++) {
+ if (reserve_evntsel_nmi(MSR_P6_EVNTSEL0 + i))
+ msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
+ else
+ msrs->controls[i].addr = 0;
+ }
}
@@ -56,6 +68,8 @@
/* clear all counters */
for (i = 0 ; i < NUM_CONTROLS; ++i) {
+ if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+ continue;
CTRL_READ(low, high, msrs, i);
CTRL_CLEAR(low);
CTRL_WRITE(low, high, msrs, i);
@@ -63,12 +77,14 @@
/* avoid a false detection of ctr overflows in NMI handler */
for (i = 0; i < NUM_COUNTERS; ++i) {
+ if (unlikely(!CTR_IS_RESERVED(msrs,i)))
+ continue;
CTR_WRITE(1, msrs, i);
}
/* enable active counters */
for (i = 0; i < NUM_COUNTERS; ++i) {
- if (counter_config[i].enabled) {
+ if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) {
reset_value[i] = counter_config[i].count;
CTR_WRITE(counter_config[i].count, msrs, i);
@@ -81,6 +97,8 @@
CTRL_SET_UM(low, counter_config[i].unit_mask);
CTRL_SET_EVENT(low, counter_config[i].event);
CTRL_WRITE(low, high, msrs, i);
+ } else {
+ reset_value[i] = 0;
}
}
}
@@ -93,6 +111,8 @@
int i;
for (i = 0 ; i < NUM_COUNTERS; ++i) {
+ if (!reset_value[i])
+ continue;
CTR_READ(low, high, msrs, i);
if (CTR_OVERFLOWED(low)) {
oprofile_add_sample(regs, i);
@@ -118,18 +138,38 @@
static void ppro_start(struct op_msrs const * const msrs)
{
unsigned int low,high;
- CTRL_READ(low, high, msrs, 0);
- CTRL_SET_ACTIVE(low);
- CTRL_WRITE(low, high, msrs, 0);
+
+ if (reset_value[0]) {
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_ACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+ }
}
static void ppro_stop(struct op_msrs const * const msrs)
{
unsigned int low,high;
- CTRL_READ(low, high, msrs, 0);
- CTRL_SET_INACTIVE(low);
- CTRL_WRITE(low, high, msrs, 0);
+
+ if (reset_value[0]) {
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_INACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+ }
+}
+
+static void ppro_shutdown(struct op_msrs const * const msrs)
+{
+ int i;
+
+ for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+ if (CTR_IS_RESERVED(msrs,i))
+ release_perfctr_nmi(MSR_P6_PERFCTR0 + i);
+ }
+ for (i = 0 ; i < NUM_CONTROLS ; ++i) {
+ if (CTRL_IS_RESERVED(msrs,i))
+ release_evntsel_nmi(MSR_P6_EVNTSEL0 + i);
+ }
}
@@ -140,5 +180,6 @@
.setup_ctrs = &ppro_setup_ctrs,
.check_ctrs = &ppro_check_ctrs,
.start = &ppro_start,
- .stop = &ppro_stop
+ .stop = &ppro_stop,
+ .shutdown = &ppro_shutdown
};
diff --git a/arch/i386/oprofile/op_x86_model.h b/arch/i386/oprofile/op_x86_model.h
index 123b7e9..abb1aa9 100644
--- a/arch/i386/oprofile/op_x86_model.h
+++ b/arch/i386/oprofile/op_x86_model.h
@@ -40,6 +40,7 @@
struct op_msrs const * const msrs);
void (*start)(struct op_msrs const * const msrs);
void (*stop)(struct op_msrs const * const msrs);
+ void (*shutdown)(struct op_msrs const * const msrs);
};
extern struct op_x86_model_spec const op_ppro_spec;
diff --git a/arch/i386/pci/Makefile b/arch/i386/pci/Makefile
index 62ad75c..1594d2f 100644
--- a/arch/i386/pci/Makefile
+++ b/arch/i386/pci/Makefile
@@ -11,4 +11,4 @@
pci-$(CONFIG_X86_VISWS) := visws.o fixup.o
pci-$(CONFIG_X86_NUMAQ) := numa.o irq.o
-obj-y += $(pci-y) common.o
+obj-y += $(pci-y) common.o early.o
diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 0a362e3..68bce194 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -242,6 +242,10 @@
acpi_noirq_set();
return NULL;
}
+ else if (!strcmp(str, "noearly")) {
+ pci_probe |= PCI_PROBE_NOEARLY;
+ return NULL;
+ }
#ifndef CONFIG_X86_VISWS
else if (!strcmp(str, "usepirqmask")) {
pci_probe |= PCI_USE_PIRQ_MASK;
diff --git a/arch/i386/pci/direct.c b/arch/i386/pci/direct.c
index 5d81fb5..5acf0b4 100644
--- a/arch/i386/pci/direct.c
+++ b/arch/i386/pci/direct.c
@@ -254,7 +254,16 @@
return works;
}
-void __init pci_direct_init(void)
+void __init pci_direct_init(int type)
+{
+ printk(KERN_INFO "PCI: Using configuration type %d\n", type);
+ if (type == 1)
+ raw_pci_ops = &pci_direct_conf1;
+ else
+ raw_pci_ops = &pci_direct_conf2;
+}
+
+int __init pci_direct_probe(void)
{
struct resource *region, *region2;
@@ -264,19 +273,16 @@
if (!region)
goto type2;
- if (pci_check_type1()) {
- printk(KERN_INFO "PCI: Using configuration type 1\n");
- raw_pci_ops = &pci_direct_conf1;
- return;
- }
+ if (pci_check_type1())
+ return 1;
release_resource(region);
type2:
if ((pci_probe & PCI_PROBE_CONF2) == 0)
- return;
+ return 0;
region = request_region(0xCF8, 4, "PCI conf2");
if (!region)
- return;
+ return 0;
region2 = request_region(0xC000, 0x1000, "PCI conf2");
if (!region2)
goto fail2;
@@ -284,10 +290,11 @@
if (pci_check_type2()) {
printk(KERN_INFO "PCI: Using configuration type 2\n");
raw_pci_ops = &pci_direct_conf2;
- return;
+ return 2;
}
release_resource(region2);
fail2:
release_resource(region);
+ return 0;
}
diff --git a/arch/i386/pci/early.c b/arch/i386/pci/early.c
new file mode 100644
index 0000000..713d6c8
--- /dev/null
+++ b/arch/i386/pci/early.c
@@ -0,0 +1,52 @@
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <asm/pci-direct.h>
+#include <asm/io.h>
+#include "pci.h"
+
+/* Direct PCI access. This is used for PCI accesses in early boot before
+ the PCI subsystem works. */
+
+#define PDprintk(x...)
+
+u32 read_pci_config(u8 bus, u8 slot, u8 func, u8 offset)
+{
+ u32 v;
+ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
+ v = inl(0xcfc);
+ if (v != 0xffffffff)
+ PDprintk("%x reading 4 from %x: %x\n", slot, offset, v);
+ return v;
+}
+
+u8 read_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset)
+{
+ u8 v;
+ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
+ v = inb(0xcfc + (offset&3));
+ PDprintk("%x reading 1 from %x: %x\n", slot, offset, v);
+ return v;
+}
+
+u16 read_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset)
+{
+ u16 v;
+ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
+ v = inw(0xcfc + (offset&2));
+ PDprintk("%x reading 2 from %x: %x\n", slot, offset, v);
+ return v;
+}
+
+void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset,
+ u32 val)
+{
+ PDprintk("%x writing to %x: %x\n", slot, offset, val);
+ outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
+ outl(val, 0xcfc);
+}
+
+int early_pci_allowed(void)
+{
+ return (pci_probe & (PCI_PROBE_CONF1|PCI_PROBE_NOEARLY)) ==
+ PCI_PROBE_CONF1;
+}
diff --git a/arch/i386/pci/init.c b/arch/i386/pci/init.c
index 51087a9..d028e1b 100644
--- a/arch/i386/pci/init.c
+++ b/arch/i386/pci/init.c
@@ -6,8 +6,13 @@
in the right sequence from here. */
static __init int pci_access_init(void)
{
+ int type = 0;
+
+#ifdef CONFIG_PCI_DIRECT
+ type = pci_direct_probe();
+#endif
#ifdef CONFIG_PCI_MMCONFIG
- pci_mmcfg_init();
+ pci_mmcfg_init(type);
#endif
if (raw_pci_ops)
return 0;
@@ -21,7 +26,7 @@
* fails.
*/
#ifdef CONFIG_PCI_DIRECT
- pci_direct_init();
+ pci_direct_init(type);
#endif
return 0;
}
diff --git a/arch/i386/pci/mmconfig.c b/arch/i386/pci/mmconfig.c
index 972180f..05be8db 100644
--- a/arch/i386/pci/mmconfig.c
+++ b/arch/i386/pci/mmconfig.c
@@ -151,6 +151,38 @@
.write = pci_mmcfg_write,
};
+
+static __init void pci_mmcfg_insert_resources(void)
+{
+#define PCI_MMCFG_RESOURCE_NAME_LEN 19
+ int i;
+ struct resource *res;
+ char *names;
+ unsigned num_buses;
+
+ res = kcalloc(PCI_MMCFG_RESOURCE_NAME_LEN + sizeof(*res),
+ pci_mmcfg_config_num, GFP_KERNEL);
+
+ if (!res) {
+ printk(KERN_ERR "PCI: Unable to allocate MMCONFIG resources\n");
+ return;
+ }
+
+ names = (void *)&res[pci_mmcfg_config_num];
+ for (i = 0; i < pci_mmcfg_config_num; i++, res++) {
+ num_buses = pci_mmcfg_config[i].end_bus_number -
+ pci_mmcfg_config[i].start_bus_number + 1;
+ res->name = names;
+ snprintf(names, PCI_MMCFG_RESOURCE_NAME_LEN, "PCI MMCONFIG %u",
+ pci_mmcfg_config[i].pci_segment_group_number);
+ res->start = pci_mmcfg_config[i].base_address;
+ res->end = res->start + (num_buses << 20) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ insert_resource(&iomem_resource, res);
+ names += PCI_MMCFG_RESOURCE_NAME_LEN;
+ }
+}
+
/* K8 systems have some devices (typically in the builtin northbridge)
that are only accessible using type1
Normally this can be expressed in the MCFG by not listing them
@@ -187,7 +219,9 @@
}
}
-void __init pci_mmcfg_init(void)
+
+
+void __init pci_mmcfg_init(int type)
{
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;
@@ -198,7 +232,9 @@
(pci_mmcfg_config[0].base_address == 0))
return;
- if (!e820_all_mapped(pci_mmcfg_config[0].base_address,
+ /* Only do this check when type 1 works. If it doesn't work
+ assume we run on a Mac and always use MCFG */
+ if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {
printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %x is not E820-reserved\n",
@@ -212,4 +248,5 @@
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
unreachable_devices();
+ pci_mmcfg_insert_resources();
}
diff --git a/arch/i386/pci/pci.h b/arch/i386/pci/pci.h
index bf4e793..1814f74 100644
--- a/arch/i386/pci/pci.h
+++ b/arch/i386/pci/pci.h
@@ -17,6 +17,7 @@
#define PCI_PROBE_CONF2 0x0004
#define PCI_PROBE_MMCONF 0x0008
#define PCI_PROBE_MASK 0x000f
+#define PCI_PROBE_NOEARLY 0x0010
#define PCI_NO_SORT 0x0100
#define PCI_BIOS_SORT 0x0200
@@ -81,7 +82,9 @@
extern int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value);
-extern void pci_direct_init(void);
+extern int pci_direct_probe(void);
+extern void pci_direct_init(int type);
extern void pci_pcbios_init(void);
-extern void pci_mmcfg_init(void);
+extern void pci_mmcfg_init(int type);
extern void pcibios_sort(void);
+
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index de83f38..d9428a0 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -59,9 +59,7 @@
}
}
-void save_stack_trace(struct stack_trace *trace,
- struct task_struct *task, int all_contexts,
- unsigned int skip)
+void save_stack_trace(struct stack_trace *trace, struct task_struct *task)
{
register unsigned long sp asm ("15");
unsigned long orig_sp;
@@ -69,22 +67,23 @@
sp &= PSW_ADDR_INSN;
orig_sp = sp;
- sp = save_context_stack(trace, &skip, sp,
+ sp = save_context_stack(trace, &trace->skip, sp,
S390_lowcore.panic_stack - PAGE_SIZE,
S390_lowcore.panic_stack);
- if ((sp != orig_sp) && !all_contexts)
+ if ((sp != orig_sp) && !trace->all_contexts)
return;
- sp = save_context_stack(trace, &skip, sp,
+ sp = save_context_stack(trace, &trace->skip, sp,
S390_lowcore.async_stack - ASYNC_SIZE,
S390_lowcore.async_stack);
- if ((sp != orig_sp) && !all_contexts)
+ if ((sp != orig_sp) && !trace->all_contexts)
return;
if (task)
- save_context_stack(trace, &skip, sp,
+ save_context_stack(trace, &trace->skip, sp,
(unsigned long) task_stack_page(task),
(unsigned long) task_stack_page(task) + THREAD_SIZE);
else
- save_context_stack(trace, &skip, sp, S390_lowcore.thread_info,
+ save_context_stack(trace, &trace->skip, sp,
+ S390_lowcore.thread_info,
S390_lowcore.thread_info + THREAD_SIZE);
return;
}
diff --git a/arch/um/sys-i386/Makefile b/arch/um/sys-i386/Makefile
index 59cc702..0e32adf 100644
--- a/arch/um/sys-i386/Makefile
+++ b/arch/um/sys-i386/Makefile
@@ -4,7 +4,7 @@
obj-$(CONFIG_MODE_SKAS) += stub.o stub_segv.o
-subarch-obj-y = lib/bitops.o kernel/semaphore.o
+subarch-obj-y = lib/bitops.o lib/semaphore.o
subarch-obj-$(CONFIG_HIGHMEM) += mm/highmem.o
subarch-obj-$(CONFIG_MODULES) += kernel/module.o
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 581ce9a..efe249e 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -109,6 +109,7 @@
config X86_VSMP
bool "Support for ScaleMP vSMP"
+ depends on PCI
help
Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is
supposed to run on these EM64T-based machines. Only choose this option
@@ -295,7 +296,7 @@
config K8_NUMA
bool "Old style AMD Opteron NUMA detection"
- depends on NUMA
+ depends on NUMA && PCI
default y
help
Enable K8 NUMA node topology detection. You should say Y here if
@@ -425,7 +426,6 @@
config CALGARY_IOMMU
bool "IBM Calgary IOMMU support"
- default y
select SWIOTLB
depends on PCI && EXPERIMENTAL
help
@@ -472,8 +472,7 @@
the DRAM Error Threshold.
config KEXEC
- bool "kexec system call (EXPERIMENTAL)"
- depends on EXPERIMENTAL
+ bool "kexec system call"
help
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
@@ -492,7 +491,14 @@
bool "kernel crash dumps (EXPERIMENTAL)"
depends on EXPERIMENTAL
help
- Generate crash dump after being started by kexec.
+ Generate crash dump after being started by kexec.
+ This should be normally only set in special crash dump kernels
+ which are loaded in the main kernel with kexec-tools into
+ a specially reserved region and then later executed after
+ a crash by kdump/kexec. The crash dump kernel must be compiled
+ to a memory address not used by the main kernel or BIOS using
+ PHYSICAL_START.
+ For more details see Documentation/kdump/kdump.txt
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
@@ -530,6 +536,30 @@
If unsure, say Y. Only embedded should say N here.
+config CC_STACKPROTECTOR
+ bool "Enable -fstack-protector buffer overflow detection (EXPRIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ This option turns on the -fstack-protector GCC feature. This
+ feature puts, at the beginning of critical functions, a canary
+ value on the stack just before the return address, and validates
+ the value just before actually returning. Stack based buffer
+ overflows (that need to overwrite this return address) now also
+ overwrite the canary, which gets detected and the attack is then
+ neutralized via a kernel panic.
+
+ This feature requires gcc version 4.2 or above, or a distribution
+ gcc with the feature backported. Older versions are automatically
+ detected and for those versions, this configuration option is ignored.
+
+config CC_STACKPROTECTOR_ALL
+ bool "Use stack-protector for all functions"
+ depends on CC_STACKPROTECTOR
+ help
+ Normally, GCC only inserts the canary value protection for
+ functions that use large-ish on-stack buffers. By enabling
+ this option, GCC will be asked to do this for ALL functions.
+
source kernel/Kconfig.hz
config REORDER
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index 431bb4b..1c0f18d 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -54,6 +54,16 @@
cflags-y += $(call cc-option,-funit-at-a-time)
# prevent gcc from generating any FP code by mistake
cflags-y += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,)
+# do binutils support CFI?
+cflags-y += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+AFLAGS += $(call as-instr,.cfi_startproc\n.cfi_endproc,-DCONFIG_AS_CFI=1,)
+
+# is .cfi_signal_frame supported too?
+cflags-y += $(call as-instr,.cfi_startproc\n.cfi_signal_frame\n.cfi_endproc,-DCONFIG_AS_CFI_SIGNAL_FRAME=1,)
+AFLAGS += $(call as-instr,.cfi_startproc\n.cfi_signal_frame\n.cfi_endproc,-DCONFIG_AS_CFI_SIGNAL_FRAME=1,)
+
+cflags-$(CONFIG_CC_STACKPROTECTOR) += $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC) -fstack-protector )
+cflags-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC) -fstack-protector-all )
CFLAGS += $(cflags-y)
CFLAGS_KERNEL += $(cflags-kernel-y)
diff --git a/arch/x86_64/boot/compressed/Makefile b/arch/x86_64/boot/compressed/Makefile
index f89d96f..e70fa6e 100644
--- a/arch/x86_64/boot/compressed/Makefile
+++ b/arch/x86_64/boot/compressed/Makefile
@@ -7,7 +7,8 @@
#
targets := vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o
-EXTRA_AFLAGS := -traditional -m32
+EXTRA_AFLAGS := -traditional
+AFLAGS := $(subst -m64,-m32,$(AFLAGS))
# cannot use EXTRA_CFLAGS because base CFLAGS contains -mkernel which conflicts with
# -m32
diff --git a/arch/x86_64/boot/setup.S b/arch/x86_64/boot/setup.S
index a50b631..c3bfd22 100644
--- a/arch/x86_64/boot/setup.S
+++ b/arch/x86_64/boot/setup.S
@@ -526,12 +526,12 @@
movw %cs, %ax # aka SETUPSEG
subw $DELTA_INITSEG, %ax # aka INITSEG
movw %ax, %ds
- movw $0, (0x1ff) # default is no pointing device
+ movb $0, (0x1ff) # default is no pointing device
int $0x11 # int 0x11: equipment list
testb $0x04, %al # check if mouse installed
jz no_psmouse
- movw $0xAA, (0x1ff) # device present
+ movb $0xAA, (0x1ff) # device present
no_psmouse:
#include "../../i386/boot/edd.S"
diff --git a/arch/x86_64/defconfig b/arch/x86_64/defconfig
index 5fb9707..647610e 100644
--- a/arch/x86_64/defconfig
+++ b/arch/x86_64/defconfig
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
-# Linux kernel version: 2.6.18-rc4
-# Thu Aug 24 21:05:55 2006
+# Linux kernel version: 2.6.18-git5
+# Tue Sep 26 09:30:47 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
@@ -19,6 +19,7 @@
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
+CONFIG_AUDIT_ARCH=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
#
@@ -38,16 +39,16 @@
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
-CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_CPUSETS is not set
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
-CONFIG_UID16=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_EMBEDDED is not set
+CONFIG_UID16=y
+CONFIG_SYSCTL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
@@ -56,12 +57,12 @@
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
-CONFIG_RT_MUTEXES=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set
@@ -160,6 +161,7 @@
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x200000
CONFIG_SECCOMP=y
+# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
@@ -307,18 +309,23 @@
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
-CONFIG_TCP_CONG_BIC=y
+CONFIG_TCP_CONG_CUBIC=y
+CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
+# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
+# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
# CONFIG_IPV6_TUNNEL is not set
+# CONFIG_IPV6_SUBTREES is not set
+# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
@@ -345,7 +352,6 @@
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
-# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
@@ -487,6 +493,7 @@
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
+CONFIG_SCSI_NETLINK=y
# CONFIG_SCSI_PROC_FS is not set
#
@@ -508,12 +515,13 @@
# CONFIG_SCSI_LOGGING is not set
#
-# SCSI Transport Attributes
+# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=y
+# CONFIG_SCSI_SAS_LIBSAS is not set
#
# SCSI low-level drivers
@@ -532,29 +540,14 @@
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
+# CONFIG_SCSI_AIC94XX is not set
+# CONFIG_SCSI_ARCMSR is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=y
# CONFIG_MEGARAID_LEGACY is not set
CONFIG_MEGARAID_SAS=y
-CONFIG_SCSI_SATA=y
-CONFIG_SCSI_SATA_AHCI=y
-CONFIG_SCSI_SATA_SVW=y
-CONFIG_SCSI_ATA_PIIX=y
-# CONFIG_SCSI_SATA_MV is not set
-CONFIG_SCSI_SATA_NV=y
-# CONFIG_SCSI_PDC_ADMA is not set
# CONFIG_SCSI_HPTIOP is not set
-# CONFIG_SCSI_SATA_QSTOR is not set
-# CONFIG_SCSI_SATA_PROMISE is not set
-# CONFIG_SCSI_SATA_SX4 is not set
-CONFIG_SCSI_SATA_SIL=y
-# CONFIG_SCSI_SATA_SIL24 is not set
-# CONFIG_SCSI_SATA_SIS is not set
-# CONFIG_SCSI_SATA_ULI is not set
-CONFIG_SCSI_SATA_VIA=y
-# CONFIG_SCSI_SATA_VITESSE is not set
-CONFIG_SCSI_SATA_INTEL_COMBINED=y
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
@@ -563,6 +556,7 @@
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
+# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
@@ -573,6 +567,62 @@
# CONFIG_SCSI_DEBUG is not set
#
+# Serial ATA (prod) and Parallel ATA (experimental) drivers
+#
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_SVW=y
+CONFIG_ATA_PIIX=y
+# CONFIG_SATA_MV is not set
+CONFIG_SATA_NV=y
+# CONFIG_PDC_ADMA is not set
+# CONFIG_SATA_QSTOR is not set
+# CONFIG_SATA_PROMISE is not set
+# CONFIG_SATA_SX4 is not set
+CONFIG_SATA_SIL=y
+# CONFIG_SATA_SIL24 is not set
+# CONFIG_SATA_SIS is not set
+# CONFIG_SATA_ULI is not set
+CONFIG_SATA_VIA=y
+# CONFIG_SATA_VITESSE is not set
+CONFIG_SATA_INTEL_COMBINED=y
+# CONFIG_PATA_ALI is not set
+# CONFIG_PATA_AMD is not set
+# CONFIG_PATA_ARTOP is not set
+# CONFIG_PATA_ATIIXP is not set
+# CONFIG_PATA_CMD64X is not set
+# CONFIG_PATA_CS5520 is not set
+# CONFIG_PATA_CS5530 is not set
+# CONFIG_PATA_CYPRESS is not set
+# CONFIG_PATA_EFAR is not set
+# CONFIG_ATA_GENERIC is not set
+# CONFIG_PATA_HPT366 is not set
+# CONFIG_PATA_HPT37X is not set
+# CONFIG_PATA_HPT3X2N is not set
+# CONFIG_PATA_HPT3X3 is not set
+# CONFIG_PATA_IT821X is not set
+# CONFIG_PATA_JMICRON is not set
+# CONFIG_PATA_LEGACY is not set
+# CONFIG_PATA_TRIFLEX is not set
+# CONFIG_PATA_MPIIX is not set
+# CONFIG_PATA_OLDPIIX is not set
+# CONFIG_PATA_NETCELL is not set
+# CONFIG_PATA_NS87410 is not set
+# CONFIG_PATA_OPTI is not set
+# CONFIG_PATA_OPTIDMA is not set
+# CONFIG_PATA_PDC_OLD is not set
+# CONFIG_PATA_QDI is not set
+# CONFIG_PATA_RADISYS is not set
+# CONFIG_PATA_RZ1000 is not set
+# CONFIG_PATA_SC1200 is not set
+# CONFIG_PATA_SERVERWORKS is not set
+# CONFIG_PATA_PDC2027X is not set
+# CONFIG_PATA_SIL680 is not set
+# CONFIG_PATA_SIS is not set
+# CONFIG_PATA_VIA is not set
+# CONFIG_PATA_WINBOND is not set
+
+#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
@@ -678,6 +728,7 @@
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_B44=y
CONFIG_FORCEDETH=y
+# CONFIG_FORCEDETH_NAPI is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
@@ -714,6 +765,7 @@
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
CONFIG_BNX2=y
+# CONFIG_QLA3XXX is not set
#
# Ethernet (10000 Mbit)
@@ -1036,6 +1088,7 @@
# Open Sound System
#
CONFIG_SOUND_PRIME=y
+CONFIG_OSS_OBSOLETE_DRIVER=y
# CONFIG_SOUND_BT878 is not set
# CONFIG_SOUND_EMU10K1 is not set
# CONFIG_SOUND_FUSION is not set
@@ -1046,7 +1099,6 @@
# CONFIG_SOUND_MSNDPIN is not set
# CONFIG_SOUND_VIA82CXXX is not set
# CONFIG_SOUND_OSS is not set
-# CONFIG_SOUND_TVMIXER is not set
#
# USB support
@@ -1203,7 +1255,6 @@
# InfiniBand support
#
# CONFIG_INFINIBAND is not set
-# CONFIG_IPATH_CORE is not set
#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
@@ -1449,10 +1500,6 @@
# CONFIG_CRYPTO is not set
#
-# Hardware crypto devices
-#
-
-#
# Library routines
#
# CONFIG_CRC_CCITT is not set
diff --git a/arch/x86_64/ia32/ia32_aout.c b/arch/x86_64/ia32/ia32_aout.c
index 3bf58af..396d3c1 100644
--- a/arch/x86_64/ia32/ia32_aout.c
+++ b/arch/x86_64/ia32/ia32_aout.c
@@ -333,7 +333,8 @@
return error;
}
- error = bprm->file->f_op->read(bprm->file, (char *)text_addr,
+ error = bprm->file->f_op->read(bprm->file,
+ (char __user *)text_addr,
ex.a_text+ex.a_data, &pos);
if ((signed long)error < 0) {
send_sig(SIGKILL, current, 0);
@@ -366,7 +367,8 @@
down_write(¤t->mm->mmap_sem);
do_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
up_write(¤t->mm->mmap_sem);
- bprm->file->f_op->read(bprm->file,(char *)N_TXTADDR(ex),
+ bprm->file->f_op->read(bprm->file,
+ (char __user *)N_TXTADDR(ex),
ex.a_text+ex.a_data, &pos);
flush_icache_range((unsigned long) N_TXTADDR(ex),
(unsigned long) N_TXTADDR(ex) +
@@ -477,7 +479,7 @@
do_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
up_write(¤t->mm->mmap_sem);
- file->f_op->read(file, (char *)start_addr,
+ file->f_op->read(file, (char __user *)start_addr,
ex.a_text + ex.a_data, &pos);
flush_icache_range((unsigned long) start_addr,
(unsigned long) start_addr + ex.a_text + ex.a_data);
diff --git a/arch/x86_64/ia32/ia32_signal.c b/arch/x86_64/ia32/ia32_signal.c
index 25e5ca2..a6ba995 100644
--- a/arch/x86_64/ia32/ia32_signal.c
+++ b/arch/x86_64/ia32/ia32_signal.c
@@ -113,25 +113,19 @@
}
asmlinkage long
-sys32_sigsuspend(int history0, int history1, old_sigset_t mask,
- struct pt_regs *regs)
+sys32_sigsuspend(int history0, int history1, old_sigset_t mask)
{
- sigset_t saveset;
-
mask &= _BLOCKABLE;
spin_lock_irq(¤t->sighand->siglock);
- saveset = current->blocked;
+ current->saved_sigmask = current->blocked;
siginitset(¤t->blocked, mask);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- regs->rax = -EINTR;
- while (1) {
- current->state = TASK_INTERRUPTIBLE;
- schedule();
- if (do_signal(regs, &saveset))
- return -EINTR;
- }
+ current->state = TASK_INTERRUPTIBLE;
+ schedule();
+ set_thread_flag(TIF_RESTORE_SIGMASK);
+ return -ERESTARTNOHAND;
}
asmlinkage long
@@ -437,15 +431,7 @@
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
- {
- struct exec_domain *ed = current_thread_info()->exec_domain;
- err |= __put_user((ed
- && ed->signal_invmap
- && sig < 32
- ? ed->signal_invmap[sig]
- : sig),
- &frame->sig);
- }
+ err |= __put_user(sig, &frame->sig);
if (err)
goto give_sigsegv;
@@ -492,6 +478,11 @@
regs->rsp = (unsigned long) frame;
regs->rip = (unsigned long) ka->sa.sa_handler;
+ /* Make -mregparm=3 work */
+ regs->rax = sig;
+ regs->rdx = 0;
+ regs->rcx = 0;
+
asm volatile("movl %0,%%ds" :: "r" (__USER32_DS));
asm volatile("movl %0,%%es" :: "r" (__USER32_DS));
@@ -499,20 +490,20 @@
regs->ss = __USER32_DS;
set_fs(USER_DS);
- regs->eflags &= ~TF_MASK;
- if (test_thread_flag(TIF_SINGLESTEP))
- ptrace_notify(SIGTRAP);
+ regs->eflags &= ~TF_MASK;
+ if (test_thread_flag(TIF_SINGLESTEP))
+ ptrace_notify(SIGTRAP);
#if DEBUG_SIG
printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n",
current->comm, current->pid, frame, regs->rip, frame->pretcode);
#endif
- return 1;
+ return 0;
give_sigsegv:
force_sigsegv(sig, current);
- return 0;
+ return -EFAULT;
}
int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
@@ -595,18 +586,18 @@
regs->ss = __USER32_DS;
set_fs(USER_DS);
- regs->eflags &= ~TF_MASK;
- if (test_thread_flag(TIF_SINGLESTEP))
- ptrace_notify(SIGTRAP);
+ regs->eflags &= ~TF_MASK;
+ if (test_thread_flag(TIF_SINGLESTEP))
+ ptrace_notify(SIGTRAP);
#if DEBUG_SIG
printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n",
current->comm, current->pid, frame, regs->rip, frame->pretcode);
#endif
- return 1;
+ return 0;
give_sigsegv:
force_sigsegv(sig, current);
- return 0;
+ return -EFAULT;
}
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 5d4a7d1..b4aa875 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -71,6 +71,7 @@
*/
ENTRY(ia32_sysenter_target)
CFI_STARTPROC32 simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,0
CFI_REGISTER rsp,rbp
swapgs
@@ -186,6 +187,7 @@
*/
ENTRY(ia32_cstar_target)
CFI_STARTPROC32 simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
@@ -293,6 +295,7 @@
ENTRY(ia32_syscall)
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,SS+8-RIP
/*CFI_REL_OFFSET ss,SS-RIP*/
CFI_REL_OFFSET rsp,RSP-RIP
@@ -370,6 +373,7 @@
popq %r11
CFI_ENDPROC
CFI_STARTPROC32 simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,SS+8-ARGOFFSET
CFI_REL_OFFSET rax,RAX-ARGOFFSET
CFI_REL_OFFSET rcx,RCX-ARGOFFSET
@@ -703,8 +707,8 @@
.quad sys_readlinkat /* 305 */
.quad sys_fchmodat
.quad sys_faccessat
- .quad quiet_ni_syscall /* pselect6 for now */
- .quad quiet_ni_syscall /* ppoll for now */
+ .quad compat_sys_pselect6
+ .quad compat_sys_ppoll
.quad sys_unshare /* 310 */
.quad compat_sys_set_robust_list
.quad compat_sys_get_robust_list
@@ -713,4 +717,5 @@
.quad sys_tee
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
+ .quad sys_getcpu
ia32_syscall_end:
diff --git a/arch/x86_64/ia32/ptrace32.c b/arch/x86_64/ia32/ptrace32.c
index 659c072..d18198e 100644
--- a/arch/x86_64/ia32/ptrace32.c
+++ b/arch/x86_64/ia32/ptrace32.c
@@ -117,6 +117,10 @@
if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1)
return -EIO;
child->thread.debugreg7 = val;
+ if (val)
+ set_tsk_thread_flag(child, TIF_DEBUG);
+ else
+ clear_tsk_thread_flag(child, TIF_DEBUG);
break;
default:
@@ -371,8 +375,10 @@
ret = -EIO;
if (!access_ok(VERIFY_READ, u, sizeof(*u)))
break;
- /* no checking to be bug-to-bug compatible with i386 */
- __copy_from_user(&child->thread.i387.fxsave, u, sizeof(*u));
+ /* no checking to be bug-to-bug compatible with i386. */
+ /* but silence warning */
+ if (__copy_from_user(&child->thread.i387.fxsave, u, sizeof(*u)))
+ ;
set_stopped_child_used_math(child);
child->thread.i387.fxsave.mxcsr &= mxcsr_feature_mask;
ret = 0;
diff --git a/arch/x86_64/ia32/sys_ia32.c b/arch/x86_64/ia32/sys_ia32.c
index 9c13099..b0e82c7 100644
--- a/arch/x86_64/ia32/sys_ia32.c
+++ b/arch/x86_64/ia32/sys_ia32.c
@@ -60,6 +60,7 @@
#include <linux/highuid.h>
#include <linux/vmalloc.h>
#include <linux/fsnotify.h>
+#include <linux/sysctl.h>
#include <asm/mman.h>
#include <asm/types.h>
#include <asm/uaccess.h>
@@ -389,7 +390,9 @@
}
}
set_fs (KERNEL_DS);
- ret = sys_rt_sigprocmask(how, set ? &s : NULL, oset ? &s : NULL,
+ ret = sys_rt_sigprocmask(how,
+ set ? (sigset_t __user *)&s : NULL,
+ oset ? (sigset_t __user *)&s : NULL,
sigsetsize);
set_fs (old_fs);
if (ret) return ret;
@@ -541,7 +544,7 @@
int bitcount = 0;
set_fs (KERNEL_DS);
- ret = sys_sysinfo(&s);
+ ret = sys_sysinfo((struct sysinfo __user *)&s);
set_fs (old_fs);
/* Check to see if any memory value is too large for 32-bit and scale
@@ -589,7 +592,7 @@
mm_segment_t old_fs = get_fs ();
set_fs (KERNEL_DS);
- ret = sys_sched_rr_get_interval(pid, &t);
+ ret = sys_sched_rr_get_interval(pid, (struct timespec __user *)&t);
set_fs (old_fs);
if (put_compat_timespec(&t, interval))
return -EFAULT;
@@ -605,7 +608,7 @@
mm_segment_t old_fs = get_fs();
set_fs (KERNEL_DS);
- ret = sys_rt_sigpending(&s, sigsetsize);
+ ret = sys_rt_sigpending((sigset_t __user *)&s, sigsetsize);
set_fs (old_fs);
if (!ret) {
switch (_NSIG_WORDS) {
@@ -630,7 +633,7 @@
if (copy_siginfo_from_user32(&info, uinfo))
return -EFAULT;
set_fs (KERNEL_DS);
- ret = sys_rt_sigqueueinfo(pid, sig, &info);
+ ret = sys_rt_sigqueueinfo(pid, sig, (siginfo_t __user *)&info);
set_fs (old_fs);
return ret;
}
@@ -666,9 +669,6 @@
size_t oldlen;
int __user *namep;
long ret;
- extern int do_sysctl(int *name, int nlen, void *oldval, size_t *oldlenp,
- void *newval, size_t newlen);
-
if (copy_from_user(&a32, args32, sizeof (a32)))
return -EFAULT;
@@ -692,7 +692,8 @@
set_fs(KERNEL_DS);
lock_kernel();
- ret = do_sysctl(namep, a32.nlen, oldvalp, &oldlen, newvalp, (size_t) a32.newlen);
+ ret = do_sysctl(namep, a32.nlen, oldvalp, (size_t __user *)&oldlen,
+ newvalp, (size_t) a32.newlen);
unlock_kernel();
set_fs(old_fs);
@@ -743,7 +744,8 @@
return -EFAULT;
set_fs(KERNEL_DS);
- ret = sys_sendfile(out_fd, in_fd, offset ? &of : NULL, count);
+ ret = sys_sendfile(out_fd, in_fd, offset ? (off_t __user *)&of : NULL,
+ count);
set_fs(old_fs);
if (offset && put_user(of, offset))
@@ -778,7 +780,7 @@
asmlinkage long sys32_olduname(struct oldold_utsname __user * name)
{
- int error;
+ int err;
if (!name)
return -EFAULT;
@@ -787,27 +789,31 @@
down_read(&uts_sem);
- error = __copy_to_user(&name->sysname,&system_utsname.sysname,__OLD_UTS_LEN);
- __put_user(0,name->sysname+__OLD_UTS_LEN);
- __copy_to_user(&name->nodename,&system_utsname.nodename,__OLD_UTS_LEN);
- __put_user(0,name->nodename+__OLD_UTS_LEN);
- __copy_to_user(&name->release,&system_utsname.release,__OLD_UTS_LEN);
- __put_user(0,name->release+__OLD_UTS_LEN);
- __copy_to_user(&name->version,&system_utsname.version,__OLD_UTS_LEN);
- __put_user(0,name->version+__OLD_UTS_LEN);
+ err = __copy_to_user(&name->sysname,&system_utsname.sysname,
+ __OLD_UTS_LEN);
+ err |= __put_user(0,name->sysname+__OLD_UTS_LEN);
+ err |= __copy_to_user(&name->nodename,&system_utsname.nodename,
+ __OLD_UTS_LEN);
+ err |= __put_user(0,name->nodename+__OLD_UTS_LEN);
+ err |= __copy_to_user(&name->release,&system_utsname.release,
+ __OLD_UTS_LEN);
+ err |= __put_user(0,name->release+__OLD_UTS_LEN);
+ err |= __copy_to_user(&name->version,&system_utsname.version,
+ __OLD_UTS_LEN);
+ err |= __put_user(0,name->version+__OLD_UTS_LEN);
{
char *arch = "x86_64";
if (personality(current->personality) == PER_LINUX32)
arch = "i686";
- __copy_to_user(&name->machine,arch,strlen(arch)+1);
+ err |= __copy_to_user(&name->machine,arch,strlen(arch)+1);
}
up_read(&uts_sem);
- error = error ? -EFAULT : 0;
+ err = err ? -EFAULT : 0;
- return error;
+ return err;
}
long sys32_uname(struct old_utsname __user * name)
@@ -831,7 +837,7 @@
seg = get_fs();
set_fs(KERNEL_DS);
- ret = sys_ustat(dev,&u);
+ ret = sys_ustat(dev, (struct ustat __user *)&u);
set_fs(seg);
if (ret >= 0) {
if (!access_ok(VERIFY_WRITE,u32p,sizeof(struct ustat32)) ||
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index b5aaeaf..3c7cbff 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -11,7 +11,7 @@
pci-dma.o pci-nommu.o alternative.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
-obj-$(CONFIG_X86_MCE) += mce.o
+obj-$(CONFIG_X86_MCE) += mce.o therm_throt.o
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel.o
obj-$(CONFIG_X86_MCE_AMD) += mce_amd.o
obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
@@ -20,8 +20,8 @@
obj-$(CONFIG_MICROCODE) += microcode.o
obj-$(CONFIG_X86_CPUID) += cpuid.o
obj-$(CONFIG_SMP) += smp.o smpboot.o trampoline.o
-obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o
-obj-$(CONFIG_X86_IO_APIC) += io_apic.o mpparse.o \
+obj-y += apic.o nmi.o
+obj-y += io_apic.o mpparse.o \
genapic.o genapic_cluster.o genapic_flat.o
obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
@@ -39,12 +39,14 @@
obj-$(CONFIG_AUDIT) += audit.o
obj-$(CONFIG_MODULES) += module.o
+obj-$(CONFIG_PCI) += early-quirks.o
obj-y += topology.o
obj-y += intel_cacheinfo.o
CFLAGS_vsyscall.o := $(PROFILING) -g0
+therm_throt-y += ../../i386/kernel/cpu/mcheck/therm_throt.o
bootflag-y += ../../i386/kernel/bootflag.o
cpuid-$(subst m,y,$(CONFIG_X86_CPUID)) += ../../i386/kernel/cpuid.o
topology-y += ../../i386/kernel/topology.o
@@ -54,4 +56,3 @@
i8237-y += ../../i386/kernel/i8237.o
msr-$(subst m,y,$(CONFIG_X86_MSR)) += ../../i386/kernel/msr.o
alternative-y += ../../i386/kernel/alternative.o
-
diff --git a/arch/x86_64/kernel/aperture.c b/arch/x86_64/kernel/aperture.c
index 58af8e7..b487396 100644
--- a/arch/x86_64/kernel/aperture.c
+++ b/arch/x86_64/kernel/aperture.c
@@ -17,6 +17,7 @@
#include <linux/pci_ids.h>
#include <linux/pci.h>
#include <linux/bitops.h>
+#include <linux/ioport.h>
#include <asm/e820.h>
#include <asm/io.h>
#include <asm/proto.h>
@@ -33,6 +34,18 @@
int fix_aperture __initdata = 1;
+static struct resource gart_resource = {
+ .name = "GART",
+ .flags = IORESOURCE_MEM,
+};
+
+static void __init insert_aperture_resource(u32 aper_base, u32 aper_size)
+{
+ gart_resource.start = aper_base;
+ gart_resource.end = aper_base + aper_size - 1;
+ insert_resource(&iomem_resource, &gart_resource);
+}
+
/* This code runs before the PCI subsystem is initialized, so just
access the northbridge directly. */
@@ -48,7 +61,7 @@
/*
* Aperture has to be naturally aligned. This means an 2GB aperture won't
- * have much chances to find a place in the lower 4GB of memory.
+ * have much chance of finding a place in the lower 4GB of memory.
* Unfortunately we cannot move it up because that would make the
* IOMMU useless.
*/
@@ -62,6 +75,7 @@
}
printk("Mapping aperture over %d KB of RAM @ %lx\n",
aper_size >> 10, __pa(p));
+ insert_aperture_resource((u32)__pa(p), aper_size);
return (u32)__pa(p);
}
@@ -198,7 +212,7 @@
u64 aper_base, last_aper_base = 0;
int valid_agp = 0;
- if (iommu_aperture_disabled || !fix_aperture)
+ if (iommu_aperture_disabled || !fix_aperture || !early_pci_allowed())
return;
printk("Checking aperture...\n");
@@ -233,8 +247,13 @@
last_aper_base = aper_base;
}
- if (!fix && !fallback_aper_force)
+ if (!fix && !fallback_aper_force) {
+ if (last_aper_base) {
+ unsigned long n = (32 * 1024 * 1024) << last_aper_order;
+ insert_aperture_resource((u32)last_aper_base, n);
+ }
return;
+ }
if (!fallback_aper_force)
aper_alloc = search_agp_bridge(&aper_order, &valid_agp);
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 2b8cef0..135ff25 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -25,6 +25,7 @@
#include <linux/kernel_stat.h>
#include <linux/sysdev.h>
#include <linux/module.h>
+#include <linux/ioport.h>
#include <asm/atomic.h>
#include <asm/smp.h>
@@ -36,13 +37,20 @@
#include <asm/idle.h>
#include <asm/proto.h>
#include <asm/timex.h>
+#include <asm/apic.h>
+int apic_mapped;
int apic_verbosity;
int apic_runs_main_timer;
int apic_calibrate_pmtmr __initdata;
int disable_apic_timer __initdata;
+static struct resource lapic_resource = {
+ .name = "Local APIC",
+ .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
+};
+
/*
* cpu_mask that denotes the CPUs that needs timer interrupt coming in as
* IPIs in place of local APIC timers
@@ -136,72 +144,40 @@
apic_read(APIC_ESR);
}
-void __init connect_bsp_APIC(void)
-{
- if (pic_mode) {
- /*
- * Do not trust the local APIC being empty at bootup.
- */
- clear_local_APIC();
- /*
- * PIC mode, enable APIC mode in the IMCR, i.e.
- * connect BSP's local APIC to INT and NMI lines.
- */
- apic_printk(APIC_VERBOSE, "leaving PIC mode, enabling APIC mode.\n");
- outb(0x70, 0x22);
- outb(0x01, 0x23);
- }
-}
-
void disconnect_bsp_APIC(int virt_wire_setup)
{
- if (pic_mode) {
- /*
- * Put the board back into PIC mode (has an effect
- * only on certain older boards). Note that APIC
- * interrupts, including IPIs, won't work beyond
- * this point! The only exception are INIT IPIs.
- */
- apic_printk(APIC_QUIET, "disabling APIC mode, entering PIC mode.\n");
- outb(0x70, 0x22);
- outb(0x00, 0x23);
+ /* Go back to Virtual Wire compatibility mode */
+ unsigned long value;
+
+ /* For the spurious interrupt use vector F, and enable it */
+ value = apic_read(APIC_SPIV);
+ value &= ~APIC_VECTOR_MASK;
+ value |= APIC_SPIV_APIC_ENABLED;
+ value |= 0xf;
+ apic_write(APIC_SPIV, value);
+
+ if (!virt_wire_setup) {
+ /* For LVT0 make it edge triggered, active high, external and enabled */
+ value = apic_read(APIC_LVT0);
+ value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING |
+ APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR |
+ APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED );
+ value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
+ value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT);
+ apic_write(APIC_LVT0, value);
+ } else {
+ /* Disable LVT0 */
+ apic_write(APIC_LVT0, APIC_LVT_MASKED);
}
- else {
- /* Go back to Virtual Wire compatibility mode */
- unsigned long value;
- /* For the spurious interrupt use vector F, and enable it */
- value = apic_read(APIC_SPIV);
- value &= ~APIC_VECTOR_MASK;
- value |= APIC_SPIV_APIC_ENABLED;
- value |= 0xf;
- apic_write(APIC_SPIV, value);
-
- if (!virt_wire_setup) {
- /* For LVT0 make it edge triggered, active high, external and enabled */
- value = apic_read(APIC_LVT0);
- value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING |
- APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR |
- APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED );
- value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
- value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT);
- apic_write(APIC_LVT0, value);
- }
- else {
- /* Disable LVT0 */
- apic_write(APIC_LVT0, APIC_LVT_MASKED);
- }
-
- /* For LVT1 make it edge triggered, active high, nmi and enabled */
- value = apic_read(APIC_LVT1);
- value &= ~(
- APIC_MODE_MASK | APIC_SEND_PENDING |
+ /* For LVT1 make it edge triggered, active high, nmi and enabled */
+ value = apic_read(APIC_LVT1);
+ value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING |
APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR |
APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED);
- value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
- value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_NMI);
- apic_write(APIC_LVT1, value);
- }
+ value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
+ value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_NMI);
+ apic_write(APIC_LVT1, value);
}
void disable_local_APIC(void)
@@ -297,8 +273,6 @@
| APIC_DM_INIT);
}
-extern void __error_in_apic_c (void);
-
/*
* An initial setup of the virtual wire mode.
*/
@@ -345,8 +319,7 @@
value = apic_read(APIC_LVR);
- if ((SPURIOUS_APIC_VECTOR & 0x0f) != 0x0f)
- __error_in_apic_c();
+ BUILD_BUG_ON((SPURIOUS_APIC_VECTOR & 0x0f) != 0x0f);
/*
* Double-check whether this APIC is really registered.
@@ -399,32 +372,8 @@
*/
value |= APIC_SPIV_APIC_ENABLED;
- /*
- * Some unknown Intel IO/APIC (or APIC) errata is biting us with
- * certain networking cards. If high frequency interrupts are
- * happening on a particular IOAPIC pin, plus the IOAPIC routing
- * entry is masked/unmasked at a high rate as well then sooner or
- * later IOAPIC line gets 'stuck', no more interrupts are received
- * from the device. If focus CPU is disabled then the hang goes
- * away, oh well :-(
- *
- * [ This bug can be reproduced easily with a level-triggered
- * PCI Ne2000 networking cards and PII/PIII processors, dual
- * BX chipset. ]
- */
- /*
- * Actually disabling the focus CPU check just makes the hang less
- * frequent as it makes the interrupt distributon model be more
- * like LRU than MRU (the short-term load is more even across CPUs).
- * See also the comment in end_level_ioapic_irq(). --macro
- */
-#if 1
- /* Enable focus processor (bit==0) */
- value &= ~APIC_SPIV_FOCUS_DISABLED;
-#else
- /* Disable focus processor (bit==1) */
- value |= APIC_SPIV_FOCUS_DISABLED;
-#endif
+ /* We always use processor focus */
+
/*
* Set spurious IRQ vector
*/
@@ -442,7 +391,7 @@
* TODO: set up through-local-APIC from through-I/O-APIC? --macro
*/
value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
- if (!smp_processor_id() && (pic_mode || !value)) {
+ if (!smp_processor_id() && !value) {
value = APIC_DM_EXTINT;
apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", smp_processor_id());
} else {
@@ -479,8 +428,7 @@
}
nmi_watchdog_default();
- if (nmi_watchdog == NMI_LOCAL_APIC)
- setup_apic_nmi_watchdog();
+ setup_apic_nmi_watchdog(NULL);
apic_pm_activate();
}
@@ -527,8 +475,7 @@
apic_pm_state.apic_tmict = apic_read(APIC_TMICT);
apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
- local_save_flags(flags);
- local_irq_disable();
+ local_irq_save(flags);
disable_local_APIC();
local_irq_restore(flags);
return 0;
@@ -606,18 +553,24 @@
static int __init apic_set_verbosity(char *str)
{
+ if (str == NULL) {
+ skip_ioapic_setup = 0;
+ ioapic_force = 1;
+ return 0;
+ }
if (strcmp("debug", str) == 0)
apic_verbosity = APIC_DEBUG;
else if (strcmp("verbose", str) == 0)
apic_verbosity = APIC_VERBOSE;
- else
+ else {
printk(KERN_WARNING "APIC Verbosity level %s not recognised"
- " use apic=verbose or apic=debug", str);
+ " use apic=verbose or apic=debug\n", str);
+ return -EINVAL;
+ }
- return 1;
+ return 0;
}
-
-__setup("apic=", apic_set_verbosity);
+early_param("apic", apic_set_verbosity);
/*
* Detect and enable local APICs on non-SMP boards.
@@ -638,6 +591,40 @@
return 0;
}
+#ifdef CONFIG_X86_IO_APIC
+static struct resource * __init ioapic_setup_resources(void)
+{
+#define IOAPIC_RESOURCE_NAME_SIZE 11
+ unsigned long n;
+ struct resource *res;
+ char *mem;
+ int i;
+
+ if (nr_ioapics <= 0)
+ return NULL;
+
+ n = IOAPIC_RESOURCE_NAME_SIZE + sizeof(struct resource);
+ n *= nr_ioapics;
+
+ res = alloc_bootmem(n);
+
+ if (!res)
+ return NULL;
+
+ memset(res, 0, n);
+ mem = (void *)&res[nr_ioapics];
+
+ for (i = 0; i < nr_ioapics; i++) {
+ res[i].name = mem;
+ res[i].flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ snprintf(mem, IOAPIC_RESOURCE_NAME_SIZE, "IOAPIC %u", i);
+ mem += IOAPIC_RESOURCE_NAME_SIZE;
+ }
+
+ return res;
+}
+#endif
+
void __init init_apic_mappings(void)
{
unsigned long apic_phys;
@@ -654,19 +641,26 @@
apic_phys = mp_lapic_addr;
set_fixmap_nocache(FIX_APIC_BASE, apic_phys);
+ apic_mapped = 1;
apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys);
+ /* Put local APIC into the resource map. */
+ lapic_resource.start = apic_phys;
+ lapic_resource.end = lapic_resource.start + PAGE_SIZE - 1;
+ insert_resource(&iomem_resource, &lapic_resource);
+
/*
* Fetch the APIC ID of the BSP in case we have a
* default configuration (or the MP table is broken).
*/
boot_cpu_id = GET_APIC_ID(apic_read(APIC_ID));
-#ifdef CONFIG_X86_IO_APIC
{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
int i;
+ struct resource *ioapic_res;
+ ioapic_res = ioapic_setup_resources();
for (i = 0; i < nr_ioapics; i++) {
if (smp_found_config) {
ioapic_phys = mp_ioapics[i].mpc_apicaddr;
@@ -678,9 +672,15 @@
apic_printk(APIC_VERBOSE,"mapped IOAPIC to %016lx (%016lx)\n",
__fix_to_virt(idx), ioapic_phys);
idx++;
+
+ if (ioapic_res) {
+ ioapic_res->start = ioapic_phys;
+ ioapic_res->end = ioapic_phys + (4 * 1024) - 1;
+ insert_resource(&iomem_resource, ioapic_res);
+ ioapic_res++;
+ }
}
}
-#endif
}
/*
@@ -951,7 +951,7 @@
* We take the 'long' return path, and there every subsystem
* grabs the appropriate locks (kernel lock/ irq lock).
*
- * we might want to decouple profiling from the 'long path',
+ * We might want to decouple profiling from the 'long path',
* and do the profiling totally in assembly.
*
* Currently this isn't too much of an issue (performance wise),
@@ -1123,19 +1123,15 @@
verify_local_APIC();
- connect_bsp_APIC();
-
phys_cpu_present_map = physid_mask_of_physid(boot_cpu_id);
apic_write(APIC_ID, SET_APIC_ID(boot_cpu_id));
setup_local_APIC();
-#ifdef CONFIG_X86_IO_APIC
if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
- setup_IO_APIC();
+ setup_IO_APIC();
else
nr_ioapics = 0;
-#endif
setup_boot_APIC_clock();
check_nmi_watchdog();
return 0;
@@ -1144,14 +1140,17 @@
static __init int setup_disableapic(char *str)
{
disable_apic = 1;
- return 1;
-}
+ clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
+ return 0;
+}
+early_param("disableapic", setup_disableapic);
+/* same as disableapic, for compatibility */
static __init int setup_nolapic(char *str)
{
- disable_apic = 1;
- return 1;
+ return setup_disableapic(str);
}
+early_param("nolapic", setup_nolapic);
static __init int setup_noapictimer(char *str)
{
@@ -1184,11 +1183,5 @@
}
__setup("apicpmtimer", setup_apicpmtimer);
-/* dummy parsing: see setup.c */
-
-__setup("disableapic", setup_disableapic);
-__setup("nolapic", setup_nolapic); /* same as disableapic, for compatibility */
-
__setup("noapictimer", setup_noapictimer);
-/* no "lapic" flag - we only use the lapic when the BIOS tells us so. */
diff --git a/arch/x86_64/kernel/crash.c b/arch/x86_64/kernel/crash.c
index d8d5750..3525f88 100644
--- a/arch/x86_64/kernel/crash.c
+++ b/arch/x86_64/kernel/crash.c
@@ -23,6 +23,7 @@
#include <asm/nmi.h>
#include <asm/hw_irq.h>
#include <asm/mach_apic.h>
+#include <asm/kdebug.h>
/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;
@@ -68,7 +69,7 @@
* for the data I pass, and I need tags
* on the data to indicate what information I have
* squirrelled away. ELF notes happen to provide
- * all of that that no need to invent something new.
+ * all of that, no need to invent something new.
*/
buf = (u32*)per_cpu_ptr(crash_notes, cpu);
@@ -95,15 +96,25 @@
#ifdef CONFIG_SMP
static atomic_t waiting_for_crash_ipi;
-static int crash_nmi_callback(struct pt_regs *regs, int cpu)
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
{
+ struct pt_regs *regs;
+ int cpu;
+
+ if (val != DIE_NMI_IPI)
+ return NOTIFY_OK;
+
+ regs = ((struct die_args *)data)->regs;
+ cpu = raw_smp_processor_id();
+
/*
* Don't do anything if this handler is invoked on crashing cpu.
* Otherwise, system will completely hang. Crashing cpu can get
* an NMI if system was initially booted with nmi_watchdog parameter.
*/
if (cpu == crashing_cpu)
- return 1;
+ return NOTIFY_STOP;
local_irq_disable();
crash_save_this_cpu(regs, cpu);
@@ -127,12 +138,17 @@
* cpu hotplug shouldn't matter.
*/
+static struct notifier_block crash_nmi_nb = {
+ .notifier_call = crash_nmi_callback,
+};
+
static void nmi_shootdown_cpus(void)
{
unsigned long msecs;
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
- set_nmi_callback(crash_nmi_callback);
+ if (register_die_notifier(&crash_nmi_nb))
+ return; /* return what? */
/*
* Ensure the new callback function is set before sending
@@ -178,9 +194,7 @@
if(cpu_has_apic)
disable_local_APIC();
-#if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
-#endif
crash_save_self(regs);
}
diff --git a/arch/x86_64/kernel/e820.c b/arch/x86_64/kernel/e820.c
index 708a3cd..c0af382 100644
--- a/arch/x86_64/kernel/e820.c
+++ b/arch/x86_64/kernel/e820.c
@@ -25,6 +25,8 @@
#include <asm/bootsetup.h>
#include <asm/sections.h>
+struct e820map e820 __initdata;
+
/*
* PFN of last memory page.
*/
@@ -41,7 +43,7 @@
/*
* Last pfn which the user wants to use.
*/
-unsigned long end_user_pfn = MAXMEM>>PAGE_SHIFT;
+static unsigned long __initdata end_user_pfn = MAXMEM>>PAGE_SHIFT;
extern struct resource code_resource, data_resource;
@@ -70,12 +72,7 @@
return 1;
}
#endif
- /* kernel code + 640k memory hole (later should not be needed, but
- be paranoid for now) */
- if (last >= 640*1024 && addr < 1024*1024) {
- *addrp = 1024*1024;
- return 1;
- }
+ /* kernel code */
if (last >= __pa_symbol(&_text) && last < __pa_symbol(&_end)) {
*addrp = __pa_symbol(&_end);
return 1;
@@ -565,13 +562,6 @@
* If we're lucky and live on a modern system, the setup code
* will have given us a memory map that we can use to properly
* set up memory. If we aren't, we'll fake a memory map.
- *
- * We check to see that the memory map contains at least 2 elements
- * before we'll use it, because the detection code in setup.S may
- * not be perfect and most every PC known to man has two memory
- * regions: one from 0 to 640k, and one from 1mb up. (The IBM
- * thinkpad 560x, for example, does not cooperate with the memory
- * detection code.)
*/
static int __init copy_e820_map(struct e820entry * biosmap, int nr_map)
{
@@ -589,34 +579,19 @@
if (start > end)
return -1;
- /*
- * Some BIOSes claim RAM in the 640k - 1M region.
- * Not right. Fix it up.
- *
- * This should be removed on Hammer which is supposed to not
- * have non e820 covered ISA mappings there, but I had some strange
- * problems so it stays for now. -AK
- */
- if (type == E820_RAM) {
- if (start < 0x100000ULL && end > 0xA0000ULL) {
- if (start < 0xA0000ULL)
- add_memory_region(start, 0xA0000ULL-start, type);
- if (end <= 0x100000ULL)
- continue;
- start = 0x100000ULL;
- size = end - start;
- }
- }
-
add_memory_region(start, size, type);
} while (biosmap++,--nr_map);
return 0;
}
+void early_panic(char *msg)
+{
+ early_printk(msg);
+ panic(msg);
+}
+
void __init setup_memory_region(void)
{
- char *who = "BIOS-e820";
-
/*
* Try to copy the BIOS-supplied E820-map.
*
@@ -624,51 +599,70 @@
* the next section from 1mb->appropriate_mem_k
*/
sanitize_e820_map(E820_MAP, &E820_MAP_NR);
- if (copy_e820_map(E820_MAP, E820_MAP_NR) < 0) {
- unsigned long mem_size;
-
- /* compare results from other methods and take the greater */
- if (ALT_MEM_K < EXT_MEM_K) {
- mem_size = EXT_MEM_K;
- who = "BIOS-88";
- } else {
- mem_size = ALT_MEM_K;
- who = "BIOS-e801";
- }
-
- e820.nr_map = 0;
- add_memory_region(0, LOWMEMSIZE(), E820_RAM);
- add_memory_region(HIGH_MEMORY, mem_size << 10, E820_RAM);
- }
+ if (copy_e820_map(E820_MAP, E820_MAP_NR) < 0)
+ early_panic("Cannot find a valid memory map");
printk(KERN_INFO "BIOS-provided physical RAM map:\n");
- e820_print_map(who);
+ e820_print_map("BIOS-e820");
}
-void __init parse_memopt(char *p, char **from)
-{
- end_user_pfn = memparse(p, from);
- end_user_pfn >>= PAGE_SHIFT;
-}
-
-void __init parse_memmapopt(char *p, char **from)
+static int __init parse_memopt(char *p)
{
+ if (!p)
+ return -EINVAL;
+ end_user_pfn = memparse(p, &p);
+ end_user_pfn >>= PAGE_SHIFT;
+ return 0;
+}
+early_param("mem", parse_memopt);
+
+static int userdef __initdata;
+
+static int __init parse_memmap_opt(char *p)
+{
+ char *oldp;
unsigned long long start_at, mem_size;
- mem_size = memparse(p, from);
- p = *from;
+ if (!strcmp(p, "exactmap")) {
+#ifdef CONFIG_CRASH_DUMP
+ /* If we are doing a crash dump, we
+ * still need to know the real mem
+ * size before original memory map is
+ * reset.
+ */
+ saved_max_pfn = e820_end_of_ram();
+#endif
+ end_pfn_map = 0;
+ e820.nr_map = 0;
+ userdef = 1;
+ return 0;
+ }
+
+ oldp = p;
+ mem_size = memparse(p, &p);
+ if (p == oldp)
+ return -EINVAL;
if (*p == '@') {
- start_at = memparse(p+1, from);
+ start_at = memparse(p+1, &p);
add_memory_region(start_at, mem_size, E820_RAM);
} else if (*p == '#') {
- start_at = memparse(p+1, from);
+ start_at = memparse(p+1, &p);
add_memory_region(start_at, mem_size, E820_ACPI);
} else if (*p == '$') {
- start_at = memparse(p+1, from);
+ start_at = memparse(p+1, &p);
add_memory_region(start_at, mem_size, E820_RESERVED);
} else {
end_user_pfn = (mem_size >> PAGE_SHIFT);
}
- p = *from;
+ return *p == '\0' ? 0 : -EINVAL;
+}
+early_param("memmap", parse_memmap_opt);
+
+void finish_e820_parsing(void)
+{
+ if (userdef) {
+ printk(KERN_INFO "user-defined physical RAM map:\n");
+ e820_print_map("user");
+ }
}
unsigned long pci_mem_start = 0xaeedbabe;
diff --git a/arch/x86_64/kernel/early-quirks.c b/arch/x86_64/kernel/early-quirks.c
new file mode 100644
index 0000000..208e38a
--- /dev/null
+++ b/arch/x86_64/kernel/early-quirks.c
@@ -0,0 +1,122 @@
+/* Various workarounds for chipset bugs.
+ This code runs very early and can't use the regular PCI subsystem
+ The entries are keyed to PCI bridges which usually identify chipsets
+ uniquely.
+ This is only for whole classes of chipsets with specific problems which
+ need early invasive action (e.g. before the timers are initialized).
+ Most PCI device specific workarounds can be done later and should be
+ in standard PCI quirks
+ Mainboard specific bugs should be handled by DMI entries.
+ CPU specific bugs in setup.c */
+
+#include <linux/pci.h>
+#include <linux/acpi.h>
+#include <linux/pci_ids.h>
+#include <asm/pci-direct.h>
+#include <asm/proto.h>
+#include <asm/dma.h>
+
+static void via_bugs(void)
+{
+#ifdef CONFIG_IOMMU
+ if ((end_pfn > MAX_DMA32_PFN || force_iommu) &&
+ !iommu_aperture_allowed) {
+ printk(KERN_INFO
+ "Looks like a VIA chipset. Disabling IOMMU. Override with iommu=allowed\n");
+ iommu_aperture_disabled = 1;
+ }
+#endif
+}
+
+#ifdef CONFIG_ACPI
+
+static int nvidia_hpet_detected __initdata;
+
+static int __init nvidia_hpet_check(unsigned long phys, unsigned long size)
+{
+ nvidia_hpet_detected = 1;
+ return 0;
+}
+#endif
+
+static void nvidia_bugs(void)
+{
+#ifdef CONFIG_ACPI
+ /*
+ * All timer overrides on Nvidia are
+ * wrong unless HPET is enabled.
+ */
+ nvidia_hpet_detected = 0;
+ acpi_table_parse(ACPI_HPET, nvidia_hpet_check);
+ if (nvidia_hpet_detected == 0) {
+ acpi_skip_timer_override = 1;
+ printk(KERN_INFO "Nvidia board "
+ "detected. Ignoring ACPI "
+ "timer override.\n");
+ }
+#endif
+ /* RED-PEN skip them on mptables too? */
+
+}
+
+static void ati_bugs(void)
+{
+#if 1 /* for testing */
+ printk("ATI board detected\n");
+#endif
+ /* No bugs right now */
+}
+
+struct chipset {
+ u16 vendor;
+ void (*f)(void);
+};
+
+static struct chipset early_qrk[] = {
+ { PCI_VENDOR_ID_NVIDIA, nvidia_bugs },
+ { PCI_VENDOR_ID_VIA, via_bugs },
+ { PCI_VENDOR_ID_ATI, ati_bugs },
+ {}
+};
+
+void __init early_quirks(void)
+{
+ int num, slot, func;
+
+ if (!early_pci_allowed())
+ return;
+
+ /* Poor man's PCI discovery */
+ for (num = 0; num < 32; num++) {
+ for (slot = 0; slot < 32; slot++) {
+ for (func = 0; func < 8; func++) {
+ u32 class;
+ u32 vendor;
+ u8 type;
+ int i;
+ class = read_pci_config(num,slot,func,
+ PCI_CLASS_REVISION);
+ if (class == 0xffffffff)
+ break;
+
+ if ((class >> 16) != PCI_CLASS_BRIDGE_PCI)
+ continue;
+
+ vendor = read_pci_config(num, slot, func,
+ PCI_VENDOR_ID);
+ vendor &= 0xffff;
+
+ for (i = 0; early_qrk[i].f; i++)
+ if (early_qrk[i].vendor == vendor) {
+ early_qrk[i].f();
+ return;
+ }
+
+ type = read_pci_config_byte(num, slot, func,
+ PCI_HEADER_TYPE);
+ if (!(type & 0x80))
+ break;
+ }
+ }
+ }
+}
diff --git a/arch/x86_64/kernel/early_printk.c b/arch/x86_64/kernel/early_printk.c
index 140051e..e22ecd5 100644
--- a/arch/x86_64/kernel/early_printk.c
+++ b/arch/x86_64/kernel/early_printk.c
@@ -215,20 +215,16 @@
static int __initdata keep_early;
-int __init setup_early_printk(char *opt)
+static int __init setup_early_printk(char *buf)
{
- char *space;
- char buf[256];
+ if (!buf)
+ return 0;
if (early_console_initialized)
- return 1;
+ return 0;
+ early_console_initialized = 1;
- strlcpy(buf,opt,sizeof(buf));
- space = strchr(buf, ' ');
- if (space)
- *space = 0;
-
- if (strstr(buf,"keep"))
+ if (!strcmp(buf,"keep"))
keep_early = 1;
if (!strncmp(buf, "serial", 6)) {
@@ -248,11 +244,12 @@
early_console = &simnow_console;
keep_early = 1;
}
- early_console_initialized = 1;
register_console(early_console);
return 0;
}
+early_param("earlyprintk", setup_early_printk);
+
void __init disable_early_printk(void)
{
if (!early_console_initialized || !early_console)
@@ -266,4 +263,3 @@
}
}
-__setup("earlyprintk=", setup_early_printk);
diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index aa8d893..2802524 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -4,8 +4,6 @@
* Copyright (C) 1991, 1992 Linus Torvalds
* Copyright (C) 2000, 2001, 2002 Andi Kleen SuSE Labs
* Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
- *
- * $Id$
*/
/*
@@ -22,15 +20,25 @@
* at the top of the kernel process stack.
* - partial stack frame: partially saved registers upto R11.
* - full stack frame: Like partial stack frame, but all register saved.
- *
- * TODO:
- * - schedule it carefully for the final hardware.
+ *
+ * Some macro usage:
+ * - CFI macros are used to generate dwarf2 unwind information for better
+ * backtraces. They don't change any code.
+ * - SAVE_ALL/RESTORE_ALL - Save/restore all registers
+ * - SAVE_ARGS/RESTORE_ARGS - Save/restore registers that C functions modify.
+ * There are unfortunately lots of special cases where some registers
+ * not touched. The macro is a big mess that should be cleaned up.
+ * - SAVE_REST/RESTORE_REST - Handle the registers not saved by SAVE_ARGS.
+ * Gives a full stack frame.
+ * - ENTRY/END Define functions in the symbol table.
+ * - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
+ * frame that is otherwise undefined after a SYSCALL
+ * - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
+ * - errorentry/paranoidentry/zeroentry - Define exception entry points.
*/
-#define ASSEMBLY 1
#include <linux/linkage.h>
#include <asm/segment.h>
-#include <asm/smp.h>
#include <asm/cache.h>
#include <asm/errno.h>
#include <asm/dwarf2.h>
@@ -115,6 +123,7 @@
.macro CFI_DEFAULT_STACK start=1
.if \start
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,SS+8
.else
CFI_DEF_CFA_OFFSET SS+8
@@ -146,6 +155,10 @@
/* rdi: prev */
ENTRY(ret_from_fork)
CFI_DEFAULT_STACK
+ push kernel_eflags(%rip)
+ CFI_ADJUST_CFA_OFFSET 4
+ popf # reset kernel eflags
+ CFI_ADJUST_CFA_OFFSET -4
call schedule_tail
GET_THREAD_INFO(%rcx)
testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),threadinfo_flags(%rcx)
@@ -199,6 +212,7 @@
ENTRY(system_call)
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
@@ -316,6 +330,7 @@
*/
ENTRY(int_ret_from_sys_call)
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,SS+8-ARGOFFSET
/*CFI_REL_OFFSET ss,SS-ARGOFFSET*/
CFI_REL_OFFSET rsp,RSP-ARGOFFSET
@@ -476,6 +491,7 @@
*/
.macro _frame ref
CFI_STARTPROC simple
+ CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,SS+8-\ref
/*CFI_REL_OFFSET ss,SS-\ref*/
CFI_REL_OFFSET rsp,RSP-\ref
@@ -511,7 +527,12 @@
testl $3,CS(%rdi)
je 1f
swapgs
-1: incl %gs:pda_irqcount # RED-PEN should check preempt count
+ /* irqcount is used to check if a CPU is already on an interrupt
+ stack or not. While this is essentially redundant with preempt_count
+ it is a little cheaper to use a separate counter in the PDA
+ (short of moving irq_enter into assembly, which would be too
+ much work) */
+1: incl %gs:pda_irqcount
cmoveq %gs:pda_irqstackptr,%rsp
push %rbp # backlink for old unwinder
/*
@@ -619,8 +640,7 @@
#ifdef CONFIG_PREEMPT
/* Returning to kernel space. Check if we need preemption */
/* rcx: threadinfo. interrupts off. */
- .p2align
-retint_kernel:
+ENTRY(retint_kernel)
cmpl $0,threadinfo_preempt_count(%rcx)
jnz retint_restore_args
bt $TIF_NEED_RESCHED,threadinfo_flags(%rcx)
@@ -679,7 +699,6 @@
END(call_function_interrupt)
#endif
-#ifdef CONFIG_X86_LOCAL_APIC
ENTRY(apic_timer_interrupt)
apicinterrupt LOCAL_TIMER_VECTOR,smp_apic_timer_interrupt
END(apic_timer_interrupt)
@@ -691,7 +710,6 @@
ENTRY(spurious_interrupt)
apicinterrupt SPURIOUS_APIC_VECTOR,smp_spurious_interrupt
END(spurious_interrupt)
-#endif
/*
* Exception entry points.
@@ -768,7 +786,9 @@
testl $3,CS(%rsp)
jnz paranoid_userspace\trace
paranoid_swapgs\trace:
+ .if \trace
TRACE_IRQS_IRETQ 0
+ .endif
swapgs
paranoid_restore\trace:
RESTORE_ALL 8
@@ -814,7 +834,7 @@
* Exception entry point. This expects an error code/orig_rax on the stack
* and the exception handler in %rax.
*/
-ENTRY(error_entry)
+KPROBE_ENTRY(error_entry)
_frame RDI
/* rdi slot contains rax, oldrax contains error code */
cld
@@ -898,7 +918,7 @@
cmpq $gs_change,RIP(%rsp)
je error_swapgs
jmp error_sti
-END(error_entry)
+KPROBE_END(error_entry)
/* Reload gs selector with exception handling */
/* edi: new selector */
@@ -1020,8 +1040,7 @@
KPROBE_ENTRY(page_fault)
errorentry do_page_fault
-END(page_fault)
- .previous .text
+KPROBE_END(page_fault)
ENTRY(coprocessor_error)
zeroentry do_coprocessor_error
@@ -1042,8 +1061,7 @@
CFI_ADJUST_CFA_OFFSET 8
paranoidentry do_debug, DEBUG_STACK
paranoidexit
-END(debug)
- .previous .text
+KPROBE_END(debug)
/* runs on exception stack */
KPROBE_ENTRY(nmi)
@@ -1057,8 +1075,7 @@
jmp paranoid_exit1
CFI_ENDPROC
#endif
-END(nmi)
- .previous .text
+KPROBE_END(nmi)
KPROBE_ENTRY(int3)
INTR_FRAME
@@ -1067,8 +1084,7 @@
paranoidentry do_int3, DEBUG_STACK
jmp paranoid_exit1
CFI_ENDPROC
-END(int3)
- .previous .text
+KPROBE_END(int3)
ENTRY(overflow)
zeroentry do_overflow
@@ -1116,8 +1132,7 @@
KPROBE_ENTRY(general_protection)
errorentry do_general_protection
-END(general_protection)
- .previous .text
+KPROBE_END(general_protection)
ENTRY(alignment_check)
errorentry do_alignment_check
diff --git a/arch/x86_64/kernel/genapic_cluster.c b/arch/x86_64/kernel/genapic_cluster.c
index 30209175..cdb90e6 100644
--- a/arch/x86_64/kernel/genapic_cluster.c
+++ b/arch/x86_64/kernel/genapic_cluster.c
@@ -118,7 +118,6 @@
.name = "clustered",
.int_delivery_mode = dest_Fixed,
.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
- .int_delivery_dest = APIC_DEST_PHYSICAL | APIC_DM_FIXED,
.target_cpus = cluster_target_cpus,
.apic_id_registered = cluster_apic_id_registered,
.init_apic_ldr = cluster_init_apic_ldr,
diff --git a/arch/x86_64/kernel/genapic_flat.c b/arch/x86_64/kernel/genapic_flat.c
index eb86d37..50ad153 100644
--- a/arch/x86_64/kernel/genapic_flat.c
+++ b/arch/x86_64/kernel/genapic_flat.c
@@ -49,8 +49,7 @@
unsigned long cfg;
unsigned long flags;
- local_save_flags(flags);
- local_irq_disable();
+ local_irq_save(flags);
/*
* Wait for idle.
@@ -121,7 +120,6 @@
.name = "flat",
.int_delivery_mode = dest_LowestPrio,
.int_dest_mode = (APIC_DEST_LOGICAL != 0),
- .int_delivery_dest = APIC_DEST_LOGICAL | APIC_DM_LOWEST,
.target_cpus = flat_target_cpus,
.apic_id_registered = flat_apic_id_registered,
.init_apic_ldr = flat_init_apic_ldr,
@@ -180,7 +178,6 @@
.name = "physical flat",
.int_delivery_mode = dest_Fixed,
.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
- .int_delivery_dest = APIC_DEST_PHYSICAL | APIC_DM_FIXED,
.target_cpus = physflat_target_cpus,
.apic_id_registered = flat_apic_id_registered,
.init_apic_ldr = flat_init_apic_ldr,/*not needed, but shouldn't hurt*/
diff --git a/arch/x86_64/kernel/head.S b/arch/x86_64/kernel/head.S
index c9739ca..1e6f808 100644
--- a/arch/x86_64/kernel/head.S
+++ b/arch/x86_64/kernel/head.S
@@ -5,8 +5,6 @@
* Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
* Copyright (C) 2000 Karsten Keil <kkeil@suse.de>
* Copyright (C) 2001,2002 Andi Kleen <ak@suse.de>
- *
- * $Id: head.S,v 1.49 2002/03/19 17:39:25 ak Exp $
*/
@@ -187,12 +185,15 @@
/* Finally jump to run C code and to be on real kernel address
* Since we are running on identity-mapped space we have to jump
- * to the full 64bit address , this is only possible as indirect
- * jump
+ * to the full 64bit address, this is only possible as indirect
+ * jump. In addition we need to ensure %cs is set so we make this
+ * a far return.
*/
movq initial_code(%rip),%rax
- pushq $0 # fake return address
- jmp *%rax
+ pushq $0 # fake return address to stop unwinder
+ pushq $__KERNEL_CS # set correct cs
+ pushq %rax # target address in negative space
+ lretq
/* SMP bootup changes these two */
.align 8
@@ -371,7 +372,7 @@
.quad 0,0 /* TSS */
.quad 0,0 /* LDT */
.quad 0,0,0 /* three TLS descriptors */
- .quad 0 /* unused */
+ .quad 0x0000f40000000000 /* node/CPU stored in limit */
gdt_end:
/* asm/segment.h:GDT_ENTRIES must match this */
/* This should be a multiple of the cache line size */
diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
index 36647ce..9561eb3 100644
--- a/arch/x86_64/kernel/head64.c
+++ b/arch/x86_64/kernel/head64.c
@@ -45,38 +45,16 @@
new_data = *(int *) (x86_boot_params + NEW_CL_POINTER);
if (!new_data) {
if (OLD_CL_MAGIC != * (u16 *) OLD_CL_MAGIC_ADDR) {
- printk("so old bootloader that it does not support commandline?!\n");
return;
}
new_data = OLD_CL_BASE_ADDR + * (u16 *) OLD_CL_OFFSET;
- printk("old bootloader convention, maybe loadlin?\n");
}
command_line = (char *) ((u64)(new_data));
memcpy(saved_command_line, command_line, COMMAND_LINE_SIZE);
- printk("Bootdata ok (command line is %s)\n", saved_command_line);
-}
-
-static void __init setup_boot_cpu_data(void)
-{
- unsigned int dummy, eax;
-
- /* get vendor info */
- cpuid(0, (unsigned int *)&boot_cpu_data.cpuid_level,
- (unsigned int *)&boot_cpu_data.x86_vendor_id[0],
- (unsigned int *)&boot_cpu_data.x86_vendor_id[8],
- (unsigned int *)&boot_cpu_data.x86_vendor_id[4]);
-
- /* get cpu type */
- cpuid(1, &eax, &dummy, &dummy,
- (unsigned int *) &boot_cpu_data.x86_capability);
- boot_cpu_data.x86 = (eax >> 8) & 0xf;
- boot_cpu_data.x86_model = (eax >> 4) & 0xf;
- boot_cpu_data.x86_mask = eax & 0xf;
}
void __init x86_64_start_kernel(char * real_mode_data)
{
- char *s;
int i;
for (i = 0; i < 256; i++)
@@ -84,10 +62,7 @@
asm volatile("lidt %0" :: "m" (idt_descr));
clear_bss();
- /*
- * This must be called really, really early:
- */
- lockdep_init();
+ early_printk("Kernel alive\n");
/*
* switch to init_level4_pgt from boot_level4_pgt
@@ -103,22 +78,5 @@
#ifdef CONFIG_SMP
cpu_set(0, cpu_online_map);
#endif
- s = strstr(saved_command_line, "earlyprintk=");
- if (s != NULL)
- setup_early_printk(strchr(s, '=') + 1);
-#ifdef CONFIG_NUMA
- s = strstr(saved_command_line, "numa=");
- if (s != NULL)
- numa_setup(s+5);
-#endif
-#ifdef CONFIG_X86_IO_APIC
- if (strstr(saved_command_line, "disableapic"))
- disable_apic = 1;
-#endif
- /* You need early console to see that */
- if (__pa_symbol(&_end) >= KERNEL_TEXT_SIZE)
- panic("Kernel too big for kernel mapping\n");
-
- setup_boot_cpu_data();
start_kernel();
}
diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index 0434b1f..2dd51f3 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -55,7 +55,6 @@
*/
BUILD_16_IRQS(0x0)
-#ifdef CONFIG_X86_LOCAL_APIC
/*
* The IO-APIC gives us many more interrupt sources. Most of these
* are unused but an SMP system is supposed to have enough memory ...
@@ -75,8 +74,6 @@
BUILD_15_IRQS(0xe)
#endif
-#endif
-
#undef BUILD_16_IRQS
#undef BUILD_15_IRQS
#undef BI
@@ -100,7 +97,6 @@
void (*interrupt[NR_IRQS])(void) = {
IRQLIST_16(0x0),
-#ifdef CONFIG_X86_IO_APIC
IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3),
IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7),
IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb),
@@ -110,7 +106,6 @@
, IRQLIST_15(0xe)
#endif
-#endif
};
#undef IRQ
@@ -128,6 +123,8 @@
DEFINE_SPINLOCK(i8259A_lock);
+static int i8259A_auto_eoi;
+
static void end_8259A_irq (unsigned int irq)
{
if (irq > 256) {
@@ -341,6 +338,8 @@
{
unsigned long flags;
+ i8259A_auto_eoi = auto_eoi;
+
spin_lock_irqsave(&i8259A_lock, flags);
outb(0xff, 0x21); /* mask all of 8259A-1 */
@@ -399,7 +398,7 @@
static int i8259A_resume(struct sys_device *dev)
{
- init_8259A(0);
+ init_8259A(i8259A_auto_eoi);
restore_ELCR(irq_trigger);
return 0;
}
@@ -453,9 +452,7 @@
{
int i;
-#ifdef CONFIG_X86_LOCAL_APIC
init_bsp_APIC();
-#endif
init_8259A(0);
for (i = 0; i < NR_IRQS; i++) {
@@ -581,14 +578,12 @@
set_intr_gate(THERMAL_APIC_VECTOR, thermal_interrupt);
set_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt);
-#ifdef CONFIG_X86_LOCAL_APIC
/* self generated IPI for local APIC timer */
set_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);
/* IPI vectors for APIC spurious and error interrupts */
set_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
set_intr_gate(ERROR_APIC_VECTOR, error_interrupt);
-#endif
/*
* Set the clock to HZ Hz, we already have a valid
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 924a4a3..0491019 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -48,7 +48,7 @@
static int no_timer_check;
-int disable_timer_pin_1 __initdata;
+static int disable_timer_pin_1 __initdata;
int timer_over_8254 __initdata = 0;
@@ -111,6 +111,33 @@
FINAL; \
}
+union entry_union {
+ struct { u32 w1, w2; };
+ struct IO_APIC_route_entry entry;
+};
+
+static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
+{
+ union entry_union eu;
+ unsigned long flags;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+ eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+ return eu.entry;
+}
+
+static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
+{
+ unsigned long flags;
+ union entry_union eu;
+ eu.entry = e;
+ spin_lock_irqsave(&ioapic_lock, flags);
+ io_apic_write(apic, 0x10 + 2*pin, eu.w1);
+ io_apic_write(apic, 0x11 + 2*pin, eu.w2);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
#ifdef CONFIG_SMP
static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t mask)
{
@@ -196,13 +223,9 @@
static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
{
struct IO_APIC_route_entry entry;
- unsigned long flags;
/* Check delivery_mode to be sure we're not clearing an SMI pin */
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int*)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
- *(((int*)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry = ioapic_read_entry(apic, pin);
if (entry.delivery_mode == dest_SMI)
return;
/*
@@ -210,10 +233,7 @@
*/
memset(&entry, 0, sizeof(entry));
entry.mask = 1;
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry) + 0));
- io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry) + 1));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(apic, pin, entry);
}
static void clear_IO_APIC (void)
@@ -225,14 +245,6 @@
clear_IO_APIC_pin(apic, pin);
}
-/*
- * support for broken MP BIOSs, enables hand-redirection of PIRQ0-7 to
- * specific CPU-side IRQs.
- */
-
-#define MAX_PIRQS 8
-static int pirq_entries [MAX_PIRQS];
-static int pirqs_enabled;
int skip_ioapic_setup;
int ioapic_force;
@@ -241,18 +253,17 @@
static int __init disable_ioapic_setup(char *str)
{
skip_ioapic_setup = 1;
- return 1;
+ return 0;
}
+early_param("noapic", disable_ioapic_setup);
-static int __init enable_ioapic_setup(char *str)
+/* Actually the next is obsolete, but keep it for paranoid reasons -AK */
+static int __init disable_timer_pin_setup(char *arg)
{
- ioapic_force = 1;
- skip_ioapic_setup = 0;
+ disable_timer_pin_1 = 1;
return 1;
}
-
-__setup("noapic", disable_ioapic_setup);
-__setup("apic", enable_ioapic_setup);
+__setup("disable_timer_pin_1", disable_timer_pin_setup);
static int __init setup_disable_8254_timer(char *s)
{
@@ -268,135 +279,6 @@
__setup("disable_8254_timer", setup_disable_8254_timer);
__setup("enable_8254_timer", setup_enable_8254_timer);
-#include <asm/pci-direct.h>
-#include <linux/pci_ids.h>
-#include <linux/pci.h>
-
-
-#ifdef CONFIG_ACPI
-
-static int nvidia_hpet_detected __initdata;
-
-static int __init nvidia_hpet_check(unsigned long phys, unsigned long size)
-{
- nvidia_hpet_detected = 1;
- return 0;
-}
-#endif
-
-/* Temporary Hack. Nvidia and VIA boards currently only work with IO-APIC
- off. Check for an Nvidia or VIA PCI bridge and turn it off.
- Use pci direct infrastructure because this runs before the PCI subsystem.
-
- Can be overwritten with "apic"
-
- And another hack to disable the IOMMU on VIA chipsets.
-
- ... and others. Really should move this somewhere else.
-
- Kludge-O-Rama. */
-void __init check_ioapic(void)
-{
- int num,slot,func;
- /* Poor man's PCI discovery */
- for (num = 0; num < 32; num++) {
- for (slot = 0; slot < 32; slot++) {
- for (func = 0; func < 8; func++) {
- u32 class;
- u32 vendor;
- u8 type;
- class = read_pci_config(num,slot,func,
- PCI_CLASS_REVISION);
- if (class == 0xffffffff)
- break;
-
- if ((class >> 16) != PCI_CLASS_BRIDGE_PCI)
- continue;
-
- vendor = read_pci_config(num, slot, func,
- PCI_VENDOR_ID);
- vendor &= 0xffff;
- switch (vendor) {
- case PCI_VENDOR_ID_VIA:
-#ifdef CONFIG_IOMMU
- if ((end_pfn > MAX_DMA32_PFN ||
- force_iommu) &&
- !iommu_aperture_allowed) {
- printk(KERN_INFO
- "Looks like a VIA chipset. Disabling IOMMU. Override with \"iommu=allowed\"\n");
- iommu_aperture_disabled = 1;
- }
-#endif
- return;
- case PCI_VENDOR_ID_NVIDIA:
-#ifdef CONFIG_ACPI
- /*
- * All timer overrides on Nvidia are
- * wrong unless HPET is enabled.
- */
- nvidia_hpet_detected = 0;
- acpi_table_parse(ACPI_HPET,
- nvidia_hpet_check);
- if (nvidia_hpet_detected == 0) {
- acpi_skip_timer_override = 1;
- printk(KERN_INFO "Nvidia board "
- "detected. Ignoring ACPI "
- "timer override.\n");
- }
-#endif
- /* RED-PEN skip them on mptables too? */
- return;
-
- /* This should be actually default, but
- for 2.6.16 let's do it for ATI only where
- it's really needed. */
- case PCI_VENDOR_ID_ATI:
- if (timer_over_8254 == 1) {
- timer_over_8254 = 0;
- printk(KERN_INFO
- "ATI board detected. Disabling timer routing over 8254.\n");
- }
- return;
- }
-
-
- /* No multi-function device? */
- type = read_pci_config_byte(num,slot,func,
- PCI_HEADER_TYPE);
- if (!(type & 0x80))
- break;
- }
- }
- }
-}
-
-static int __init ioapic_pirq_setup(char *str)
-{
- int i, max;
- int ints[MAX_PIRQS+1];
-
- get_options(str, ARRAY_SIZE(ints), ints);
-
- for (i = 0; i < MAX_PIRQS; i++)
- pirq_entries[i] = -1;
-
- pirqs_enabled = 1;
- apic_printk(APIC_VERBOSE, "PIRQ redirection, working around broken MP-BIOS.\n");
- max = MAX_PIRQS;
- if (ints[0] < MAX_PIRQS)
- max = ints[0];
-
- for (i = 0; i < max; i++) {
- apic_printk(APIC_VERBOSE, "... PIRQ%d -> IRQ %d\n", i, ints[i+1]);
- /*
- * PIRQs are mapped upside down, usually.
- */
- pirq_entries[MAX_PIRQS-i-1] = ints[i+1];
- }
- return 1;
-}
-
-__setup("pirq=", ioapic_pirq_setup);
/*
* Find the IRQ entry number of a certain pin.
@@ -425,9 +307,7 @@
for (i = 0; i < mp_irq_entries; i++) {
int lbus = mp_irqs[i].mpc_srcbus;
- if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
- mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
- mp_bus_id_to_type[lbus] == MP_BUS_MCA) &&
+ if (test_bit(lbus, mp_bus_not_pci) &&
(mp_irqs[i].mpc_irqtype == type) &&
(mp_irqs[i].mpc_srcbusirq == irq))
@@ -443,9 +323,7 @@
for (i = 0; i < mp_irq_entries; i++) {
int lbus = mp_irqs[i].mpc_srcbus;
- if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
- mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
- mp_bus_id_to_type[lbus] == MP_BUS_MCA) &&
+ if (test_bit(lbus, mp_bus_not_pci) &&
(mp_irqs[i].mpc_irqtype == type) &&
(mp_irqs[i].mpc_srcbusirq == irq))
break;
@@ -485,7 +363,7 @@
mp_irqs[i].mpc_dstapic == MP_APIC_ALL)
break;
- if ((mp_bus_id_to_type[lbus] == MP_BUS_PCI) &&
+ if (!test_bit(lbus, mp_bus_not_pci) &&
!mp_irqs[i].mpc_irqtype &&
(bus == lbus) &&
(slot == ((mp_irqs[i].mpc_srcbusirq >> 2) & 0x1f))) {
@@ -508,27 +386,6 @@
return best_guess;
}
-/*
- * EISA Edge/Level control register, ELCR
- */
-static int EISA_ELCR(unsigned int irq)
-{
- if (irq < 16) {
- unsigned int port = 0x4d0 + (irq >> 3);
- return (inb(port) >> (irq & 7)) & 1;
- }
- apic_printk(APIC_VERBOSE, "Broken MPtable reports ISA irq %d\n", irq);
- return 0;
-}
-
-/* EISA interrupts are always polarity zero and can be edge or level
- * trigger depending on the ELCR value. If an interrupt is listed as
- * EISA conforming in the MP table, that means its trigger type must
- * be read in from the ELCR */
-
-#define default_EISA_trigger(idx) (EISA_ELCR(mp_irqs[idx].mpc_srcbusirq))
-#define default_EISA_polarity(idx) (0)
-
/* ISA interrupts are always polarity zero edge triggered,
* when listed as conforming in the MP table. */
@@ -541,12 +398,6 @@
#define default_PCI_trigger(idx) (1)
#define default_PCI_polarity(idx) (1)
-/* MCA interrupts are always polarity zero level triggered,
- * when listed as conforming in the MP table. */
-
-#define default_MCA_trigger(idx) (1)
-#define default_MCA_polarity(idx) (0)
-
static int __init MPBIOS_polarity(int idx)
{
int bus = mp_irqs[idx].mpc_srcbus;
@@ -558,38 +409,11 @@
switch (mp_irqs[idx].mpc_irqflag & 3)
{
case 0: /* conforms, ie. bus-type dependent polarity */
- {
- switch (mp_bus_id_to_type[bus])
- {
- case MP_BUS_ISA: /* ISA pin */
- {
- polarity = default_ISA_polarity(idx);
- break;
- }
- case MP_BUS_EISA: /* EISA pin */
- {
- polarity = default_EISA_polarity(idx);
- break;
- }
- case MP_BUS_PCI: /* PCI pin */
- {
- polarity = default_PCI_polarity(idx);
- break;
- }
- case MP_BUS_MCA: /* MCA pin */
- {
- polarity = default_MCA_polarity(idx);
- break;
- }
- default:
- {
- printk(KERN_WARNING "broken BIOS!!\n");
- polarity = 1;
- break;
- }
- }
+ if (test_bit(bus, mp_bus_not_pci))
+ polarity = default_ISA_polarity(idx);
+ else
+ polarity = default_PCI_polarity(idx);
break;
- }
case 1: /* high active */
{
polarity = 0;
@@ -627,38 +451,11 @@
switch ((mp_irqs[idx].mpc_irqflag>>2) & 3)
{
case 0: /* conforms, ie. bus-type dependent */
- {
- switch (mp_bus_id_to_type[bus])
- {
- case MP_BUS_ISA: /* ISA pin */
- {
- trigger = default_ISA_trigger(idx);
- break;
- }
- case MP_BUS_EISA: /* EISA pin */
- {
- trigger = default_EISA_trigger(idx);
- break;
- }
- case MP_BUS_PCI: /* PCI pin */
- {
- trigger = default_PCI_trigger(idx);
- break;
- }
- case MP_BUS_MCA: /* MCA pin */
- {
- trigger = default_MCA_trigger(idx);
- break;
- }
- default:
- {
- printk(KERN_WARNING "broken BIOS!!\n");
- trigger = 1;
- break;
- }
- }
+ if (test_bit(bus, mp_bus_not_pci))
+ trigger = default_ISA_trigger(idx);
+ else
+ trigger = default_PCI_trigger(idx);
break;
- }
case 1: /* edge */
{
trigger = 0;
@@ -764,49 +561,17 @@
if (mp_irqs[idx].mpc_dstirq != pin)
printk(KERN_ERR "broken BIOS or MPTABLE parser, ayiee!!\n");
- switch (mp_bus_id_to_type[bus])
- {
- case MP_BUS_ISA: /* ISA pin */
- case MP_BUS_EISA:
- case MP_BUS_MCA:
- {
- irq = mp_irqs[idx].mpc_srcbusirq;
- break;
- }
- case MP_BUS_PCI: /* PCI pin */
- {
- /*
- * PCI IRQs are mapped in order
- */
- i = irq = 0;
- while (i < apic)
- irq += nr_ioapic_registers[i++];
- irq += pin;
- irq = gsi_irq_sharing(irq);
- break;
- }
- default:
- {
- printk(KERN_ERR "unknown bus type %d.\n",bus);
- irq = 0;
- break;
- }
- }
- BUG_ON(irq >= NR_IRQS);
-
- /*
- * PCI IRQ command line redirection. Yes, limits are hardcoded.
- */
- if ((pin >= 16) && (pin <= 23)) {
- if (pirq_entries[pin-16] != -1) {
- if (!pirq_entries[pin-16]) {
- apic_printk(APIC_VERBOSE, "disabling PIRQ%d\n", pin-16);
- } else {
- irq = pirq_entries[pin-16];
- apic_printk(APIC_VERBOSE, "using PIRQ%d -> IRQ %d\n",
- pin-16, irq);
- }
- }
+ if (test_bit(bus, mp_bus_not_pci)) {
+ irq = mp_irqs[idx].mpc_srcbusirq;
+ } else {
+ /*
+ * PCI IRQs are mapped in order
+ */
+ i = irq = 0;
+ while (i < apic)
+ irq += nr_ioapic_registers[i++];
+ irq += pin;
+ irq = gsi_irq_sharing(irq);
}
BUG_ON(irq >= NR_IRQS);
return irq;
@@ -943,9 +708,9 @@
if (!apic && (irq < 16))
disable_8259A_irq(irq);
}
+ ioapic_write_entry(apic, pin, entry);
+
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
set_native_irq_info(irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -1083,10 +848,7 @@
for (i = 0; i <= reg_01.bits.entries; i++) {
struct IO_APIC_route_entry entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry)+0) = io_apic_read(apic, 0x10+i*2);
- *(((int *)&entry)+1) = io_apic_read(apic, 0x11+i*2);
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ entry = ioapic_read_entry(apic, i);
printk(KERN_DEBUG " %02x %03X %02X ",
i,
@@ -1281,9 +1043,6 @@
irq_2_pin[i].pin = -1;
irq_2_pin[i].next = 0;
}
- if (!pirqs_enabled)
- for (i = 0; i < MAX_PIRQS; i++)
- pirq_entries[i] = -1;
/*
* The number of IO-APIC IRQ registers (== #pins):
@@ -1299,11 +1058,7 @@
/* See if any of the pins is in ExtINT mode */
for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
struct IO_APIC_route_entry entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- *(((int *)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
- *(((int *)&entry) + 1) = io_apic_read(apic, 0x11 + 2 * pin);
- spin_unlock_irqrestore(&ioapic_lock, flags);
-
+ entry = ioapic_read_entry(apic, pin);
/* If the interrupt line is enabled and in ExtInt mode
* I have found the pin where the i8259 is connected.
@@ -1355,7 +1110,6 @@
*/
if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;
- unsigned long flags;
memset(&entry, 0, sizeof(entry));
entry.mask = 0; /* Enabled */
@@ -1372,84 +1126,13 @@
/*
* Add it to the IO-APIC irq-routing table:
*/
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(ioapic_i8259.apic, 0x11+2*ioapic_i8259.pin,
- *(((int *)&entry)+1));
- io_apic_write(ioapic_i8259.apic, 0x10+2*ioapic_i8259.pin,
- *(((int *)&entry)+0));
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin, entry);
}
disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}
/*
- * function to set the IO-APIC physical IDs based on the
- * values stored in the MPC table.
- *
- * by Matt Domsch <Matt_Domsch@dell.com> Tue Dec 21 12:25:05 CST 1999
- */
-
-static void __init setup_ioapic_ids_from_mpc (void)
-{
- union IO_APIC_reg_00 reg_00;
- int apic;
- int i;
- unsigned char old_id;
- unsigned long flags;
-
- /*
- * Set the IOAPIC ID to the value stored in the MPC table.
- */
- for (apic = 0; apic < nr_ioapics; apic++) {
-
- /* Read the register 0 value */
- spin_lock_irqsave(&ioapic_lock, flags);
- reg_00.raw = io_apic_read(apic, 0);
- spin_unlock_irqrestore(&ioapic_lock, flags);
-
- old_id = mp_ioapics[apic].mpc_apicid;
-
-
- printk(KERN_INFO "Using IO-APIC %d\n", mp_ioapics[apic].mpc_apicid);
-
-
- /*
- * We need to adjust the IRQ routing table
- * if the ID changed.
- */
- if (old_id != mp_ioapics[apic].mpc_apicid)
- for (i = 0; i < mp_irq_entries; i++)
- if (mp_irqs[i].mpc_dstapic == old_id)
- mp_irqs[i].mpc_dstapic
- = mp_ioapics[apic].mpc_apicid;
-
- /*
- * Read the right value from the MPC table and
- * write it into the ID register.
- */
- apic_printk(APIC_VERBOSE,KERN_INFO "...changing IO-APIC physical APIC ID to %d ...",
- mp_ioapics[apic].mpc_apicid);
-
- reg_00.bits.ID = mp_ioapics[apic].mpc_apicid;
- spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(apic, 0, reg_00.raw);
- spin_unlock_irqrestore(&ioapic_lock, flags);
-
- /*
- * Sanity check
- */
- spin_lock_irqsave(&ioapic_lock, flags);
- reg_00.raw = io_apic_read(apic, 0);
- spin_unlock_irqrestore(&ioapic_lock, flags);
- if (reg_00.bits.ID != mp_ioapics[apic].mpc_apicid)
- printk("could not set ID!\n");
- else
- apic_printk(APIC_VERBOSE," ok.\n");
- }
-}
-
-/*
* There is a nasty bug in some older SMP boards, their mptable lies
* about the timer IRQ. We do the following to work around the situation:
*
@@ -1964,11 +1647,6 @@
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
- /*
- * Set up the IO-APIC IRQ routing table.
- */
- if (!acpi_ioapic)
- setup_ioapic_ids_from_mpc();
sync_Arb_IDs();
setup_IO_APIC_irqs();
init_IO_APIC_traps();
@@ -1987,17 +1665,12 @@
{
struct IO_APIC_route_entry *entry;
struct sysfs_ioapic_data *data;
- unsigned long flags;
int i;
data = container_of(dev, struct sysfs_ioapic_data, dev);
entry = data->entry;
- spin_lock_irqsave(&ioapic_lock, flags);
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ ) {
- *(((int *)entry) + 1) = io_apic_read(dev->id, 0x11 + 2 * i);
- *(((int *)entry) + 0) = io_apic_read(dev->id, 0x10 + 2 * i);
- }
- spin_unlock_irqrestore(&ioapic_lock, flags);
+ for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ )
+ *entry = ioapic_read_entry(dev->id, i);
return 0;
}
@@ -2019,11 +1692,9 @@
reg_00.bits.ID = mp_ioapics[dev->id].mpc_apicid;
io_apic_write(dev->id, 0, reg_00.raw);
}
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ ) {
- io_apic_write(dev->id, 0x11+2*i, *(((int *)entry)+1));
- io_apic_write(dev->id, 0x10+2*i, *(((int *)entry)+0));
- }
spin_unlock_irqrestore(&ioapic_lock, flags);
+ for (i = 0; i < nr_ioapic_registers[dev->id]; i++)
+ ioapic_write_entry(dev->id, i, entry[i]);
return 0;
}
@@ -2077,19 +1748,6 @@
#define IO_APIC_MAX_ID 0xFE
-int __init io_apic_get_version (int ioapic)
-{
- union IO_APIC_reg_01 reg_01;
- unsigned long flags;
-
- spin_lock_irqsave(&ioapic_lock, flags);
- reg_01.raw = io_apic_read(ioapic, 1);
- spin_unlock_irqrestore(&ioapic_lock, flags);
-
- return reg_01.bits.version;
-}
-
-
int __init io_apic_get_redir_entries (int ioapic)
{
union IO_APIC_reg_01 reg_01;
@@ -2148,10 +1806,10 @@
if (!ioapic && (irq < 16))
disable_8259A_irq(irq);
+ ioapic_write_entry(ioapic, pin, entry);
+
spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_write(ioapic, 0x11+2*pin, *(((int *)&entry)+1));
- io_apic_write(ioapic, 0x10+2*pin, *(((int *)&entry)+0));
- set_native_irq_info(use_pci_vector() ? entry.vector : irq, TARGET_CPUS);
+ set_native_irq_info(use_pci_vector() ? entry.vector : irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags);
return 0;
diff --git a/arch/x86_64/kernel/ioport.c b/arch/x86_64/kernel/ioport.c
index b816149..fe063d3 100644
--- a/arch/x86_64/kernel/ioport.c
+++ b/arch/x86_64/kernel/ioport.c
@@ -56,6 +56,7 @@
memset(bitmap, 0xff, IO_BITMAP_BYTES);
t->io_bitmap_ptr = bitmap;
+ set_thread_flag(TIF_IO_BITMAP);
}
/*
diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 5221a53..b3677e6 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -20,11 +20,6 @@
#include <asm/idle.h>
atomic_t irq_err_count;
-#ifdef CONFIG_X86_IO_APIC
-#ifdef APIC_MISMATCH_DEBUG
-atomic_t irq_mis_count;
-#endif
-#endif
#ifdef CONFIG_DEBUG_STACKOVERFLOW
/*
@@ -92,18 +87,11 @@
for_each_online_cpu(j)
seq_printf(p, "%10u ", cpu_pda(j)->__nmi_count);
seq_putc(p, '\n');
-#ifdef CONFIG_X86_LOCAL_APIC
seq_printf(p, "LOC: ");
for_each_online_cpu(j)
seq_printf(p, "%10u ", cpu_pda(j)->apic_timer_irqs);
seq_putc(p, '\n');
-#endif
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
-#ifdef CONFIG_X86_IO_APIC
-#ifdef APIC_MISMATCH_DEBUG
- seq_printf(p, "MIS: %10u\n", atomic_read(&irq_mis_count));
-#endif
-#endif
}
return 0;
}
diff --git a/arch/x86_64/kernel/machine_kexec.c b/arch/x86_64/kernel/machine_kexec.c
index 106076b..0497e3b 100644
--- a/arch/x86_64/kernel/machine_kexec.c
+++ b/arch/x86_64/kernel/machine_kexec.c
@@ -15,6 +15,15 @@
#include <asm/mmu_context.h>
#include <asm/io.h>
+#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
+static u64 kexec_pgd[512] PAGE_ALIGNED;
+static u64 kexec_pud0[512] PAGE_ALIGNED;
+static u64 kexec_pmd0[512] PAGE_ALIGNED;
+static u64 kexec_pte0[512] PAGE_ALIGNED;
+static u64 kexec_pud1[512] PAGE_ALIGNED;
+static u64 kexec_pmd1[512] PAGE_ALIGNED;
+static u64 kexec_pte1[512] PAGE_ALIGNED;
+
static void init_level2_page(pmd_t *level2p, unsigned long addr)
{
unsigned long end_addr;
@@ -144,32 +153,19 @@
);
}
-typedef NORET_TYPE void (*relocate_new_kernel_t)(unsigned long indirection_page,
- unsigned long control_code_buffer,
- unsigned long start_address,
- unsigned long pgtable) ATTRIB_NORET;
-
-extern const unsigned char relocate_new_kernel[];
-extern const unsigned long relocate_new_kernel_size;
-
int machine_kexec_prepare(struct kimage *image)
{
- unsigned long start_pgtable, control_code_buffer;
+ unsigned long start_pgtable;
int result;
/* Calculate the offsets */
start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
- control_code_buffer = start_pgtable + PAGE_SIZE;
/* Setup the identity mapped 64bit page table */
result = init_pgtable(image, start_pgtable);
if (result)
return result;
- /* Place the code in the reboot code buffer */
- memcpy(__va(control_code_buffer), relocate_new_kernel,
- relocate_new_kernel_size);
-
return 0;
}
@@ -184,28 +180,34 @@
*/
NORET_TYPE void machine_kexec(struct kimage *image)
{
- unsigned long page_list;
- unsigned long control_code_buffer;
- unsigned long start_pgtable;
- relocate_new_kernel_t rnk;
+ unsigned long page_list[PAGES_NR];
+ void *control_page;
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
- /* Calculate the offsets */
- page_list = image->head;
- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
- control_code_buffer = start_pgtable + PAGE_SIZE;
+ control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ memcpy(control_page, relocate_kernel, PAGE_SIZE);
- /* Set the low half of the page table to my identity mapped
- * page table for kexec. Leave the high half pointing at the
- * kernel pages. Don't bother to flush the global pages
- * as that will happen when I fully switch to my identity mapped
- * page table anyway.
- */
- memcpy(__va(read_cr3()), __va(start_pgtable), PAGE_SIZE/2);
- __flush_tlb();
+ page_list[PA_CONTROL_PAGE] = __pa(control_page);
+ page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+ page_list[PA_PGD] = __pa(kexec_pgd);
+ page_list[VA_PGD] = (unsigned long)kexec_pgd;
+ page_list[PA_PUD_0] = __pa(kexec_pud0);
+ page_list[VA_PUD_0] = (unsigned long)kexec_pud0;
+ page_list[PA_PMD_0] = __pa(kexec_pmd0);
+ page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
+ page_list[PA_PTE_0] = __pa(kexec_pte0);
+ page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
+ page_list[PA_PUD_1] = __pa(kexec_pud1);
+ page_list[VA_PUD_1] = (unsigned long)kexec_pud1;
+ page_list[PA_PMD_1] = __pa(kexec_pmd1);
+ page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
+ page_list[PA_PTE_1] = __pa(kexec_pte1);
+ page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+ page_list[PA_TABLE_PAGE] =
+ (unsigned long)__pa(page_address(image->control_code_page));
/* The segment registers are funny things, they have both a
* visible and an invisible part. Whenever the visible part is
@@ -222,7 +224,36 @@
*/
set_gdt(phys_to_virt(0),0);
set_idt(phys_to_virt(0),0);
+
/* now call it */
- rnk = (relocate_new_kernel_t) control_code_buffer;
- (*rnk)(page_list, control_code_buffer, image->start, start_pgtable);
+ relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
+ image->start);
}
+
+/* crashkernel=size@addr specifies the location to reserve for
+ * a crash kernel. By reserving this memory we guarantee
+ * that linux never set's it up as a DMA target.
+ * Useful for holding code to do something appropriate
+ * after a kernel panic.
+ */
+static int __init setup_crashkernel(char *arg)
+{
+ unsigned long size, base;
+ char *p;
+ if (!arg)
+ return -EINVAL;
+ size = memparse(arg, &p);
+ if (arg == p)
+ return -EINVAL;
+ if (*p == '@') {
+ base = memparse(p+1, &p);
+ /* FIXME: Do I want a sanity check to validate the
+ * memory range? Yes you do, but it's too early for
+ * e820 -AK */
+ crashk_res.start = base;
+ crashk_res.end = base + size - 1;
+ }
+ return 0;
+}
+early_param("crashkernel", setup_crashkernel);
+
diff --git a/arch/x86_64/kernel/mce.c b/arch/x86_64/kernel/mce.c
index 4e017fb..bbea888 100644
--- a/arch/x86_64/kernel/mce.c
+++ b/arch/x86_64/kernel/mce.c
@@ -182,7 +182,7 @@
goto out2;
memset(&m, 0, sizeof(struct mce));
- m.cpu = safe_smp_processor_id();
+ m.cpu = smp_processor_id();
rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
if (!(m.mcgstatus & MCG_STATUS_RIPV))
kill_it = 1;
@@ -274,6 +274,33 @@
atomic_dec(&mce_entry);
}
+#ifdef CONFIG_X86_MCE_INTEL
+/***
+ * mce_log_therm_throt_event - Logs the thermal throttling event to mcelog
+ * @cpu: The CPU on which the event occured.
+ * @status: Event status information
+ *
+ * This function should be called by the thermal interrupt after the
+ * event has been processed and the decision was made to log the event
+ * further.
+ *
+ * The status parameter will be saved to the 'status' field of 'struct mce'
+ * and historically has been the register value of the
+ * MSR_IA32_THERMAL_STATUS (Intel) msr.
+ */
+void mce_log_therm_throt_event(unsigned int cpu, __u64 status)
+{
+ struct mce m;
+
+ memset(&m, 0, sizeof(m));
+ m.cpu = cpu;
+ m.bank = MCE_THERMAL_BANK;
+ m.status = status;
+ rdtscll(m.tsc);
+ mce_log(&m);
+}
+#endif /* CONFIG_X86_MCE_INTEL */
+
/*
* Periodic polling timer for "silent" machine check errors.
*/
diff --git a/arch/x86_64/kernel/mce_intel.c b/arch/x86_64/kernel/mce_intel.c
index 8f533d2..6551505 100644
--- a/arch/x86_64/kernel/mce_intel.c
+++ b/arch/x86_64/kernel/mce_intel.c
@@ -11,36 +11,21 @@
#include <asm/mce.h>
#include <asm/hw_irq.h>
#include <asm/idle.h>
-
-static DEFINE_PER_CPU(unsigned long, next_check);
+#include <asm/therm_throt.h>
asmlinkage void smp_thermal_interrupt(void)
{
- struct mce m;
+ __u64 msr_val;
ack_APIC_irq();
exit_idle();
irq_enter();
- if (time_before(jiffies, __get_cpu_var(next_check)))
- goto done;
- __get_cpu_var(next_check) = jiffies + HZ*300;
- memset(&m, 0, sizeof(m));
- m.cpu = smp_processor_id();
- m.bank = MCE_THERMAL_BANK;
- rdtscll(m.tsc);
- rdmsrl(MSR_IA32_THERM_STATUS, m.status);
- if (m.status & 0x1) {
- printk(KERN_EMERG
- "CPU%d: Temperature above threshold, cpu clock throttled\n", m.cpu);
- add_taint(TAINT_MACHINE_CHECK);
- } else {
- printk(KERN_EMERG "CPU%d: Temperature/speed normal\n", m.cpu);
- }
+ rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
+ if (therm_throt_process(msr_val & 1))
+ mce_log_therm_throt_event(smp_processor_id(), msr_val);
- mce_log(&m);
-done:
irq_exit();
}
@@ -92,6 +77,9 @@
apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
printk(KERN_INFO "CPU%d: Thermal monitoring enabled (%s)\n",
cpu, tm2 ? "TM2" : "TM1");
+
+ /* enable thermal throttle processing */
+ atomic_set(&therm_throt_en, 1);
return;
}
diff --git a/arch/x86_64/kernel/mpparse.c b/arch/x86_64/kernel/mpparse.c
index a1ab419..20e88f4 100644
--- a/arch/x86_64/kernel/mpparse.c
+++ b/arch/x86_64/kernel/mpparse.c
@@ -41,8 +41,7 @@
* Various Linux-internal data structures created from the
* MP-table.
*/
-unsigned char apic_version [MAX_APICS];
-unsigned char mp_bus_id_to_type [MAX_MP_BUSSES] = { [0 ... MAX_MP_BUSSES-1] = -1 };
+DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
int mp_bus_id_to_pci_bus [MAX_MP_BUSSES] = { [0 ... MAX_MP_BUSSES-1] = -1 };
static int mp_current_pci_id = 0;
@@ -56,7 +55,6 @@
int mp_irq_entries;
int nr_ioapics;
-int pic_mode;
unsigned long mp_lapic_addr = 0;
@@ -71,19 +69,6 @@
/* Bitmask of physically existing CPUs */
physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
-/* ACPI MADT entry parsing functions */
-#ifdef CONFIG_ACPI
-extern struct acpi_boot_flags acpi_boot;
-#ifdef CONFIG_X86_LOCAL_APIC
-extern int acpi_parse_lapic (acpi_table_entry_header *header);
-extern int acpi_parse_lapic_addr_ovr (acpi_table_entry_header *header);
-extern int acpi_parse_lapic_nmi (acpi_table_entry_header *header);
-#endif /*CONFIG_X86_LOCAL_APIC*/
-#ifdef CONFIG_X86_IO_APIC
-extern int acpi_parse_ioapic (acpi_table_entry_header *header);
-#endif /*CONFIG_X86_IO_APIC*/
-#endif /*CONFIG_ACPI*/
-
u8 bios_cpu_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
@@ -108,24 +93,20 @@
static void __cpuinit MP_processor_info (struct mpc_config_processor *m)
{
int cpu;
- unsigned char ver;
cpumask_t tmp_map;
+ char *bootup_cpu = "";
if (!(m->mpc_cpuflag & CPU_ENABLED)) {
disabled_cpus++;
return;
}
-
- printk(KERN_INFO "Processor #%d %d:%d APIC version %d\n",
- m->mpc_apicid,
- (m->mpc_cpufeature & CPU_FAMILY_MASK)>>8,
- (m->mpc_cpufeature & CPU_MODEL_MASK)>>4,
- m->mpc_apicver);
-
if (m->mpc_cpuflag & CPU_BOOTPROCESSOR) {
- Dprintk(" Bootup CPU\n");
+ bootup_cpu = " (Bootup-CPU)";
boot_cpu_id = m->mpc_apicid;
}
+
+ printk(KERN_INFO "Processor #%d%s\n", m->mpc_apicid, bootup_cpu);
+
if (num_processors >= NR_CPUS) {
printk(KERN_WARNING "WARNING: NR_CPUS limit of %i reached."
" Processor ignored.\n", NR_CPUS);
@@ -136,24 +117,7 @@
cpus_complement(tmp_map, cpu_present_map);
cpu = first_cpu(tmp_map);
-#if MAX_APICS < 255
- if ((int)m->mpc_apicid > MAX_APICS) {
- printk(KERN_ERR "Processor #%d INVALID. (Max ID: %d).\n",
- m->mpc_apicid, MAX_APICS);
- return;
- }
-#endif
- ver = m->mpc_apicver;
-
physid_set(m->mpc_apicid, phys_cpu_present_map);
- /*
- * Validate version
- */
- if (ver == 0x0) {
- printk(KERN_ERR "BIOS bug, APIC version is 0 for CPU#%d! fixing up to 0x10. (tell your hw vendor)\n", m->mpc_apicid);
- ver = 0x10;
- }
- apic_version[m->mpc_apicid] = ver;
if (m->mpc_cpuflag & CPU_BOOTPROCESSOR) {
/*
* bios_cpu_apicid is required to have processors listed
@@ -178,15 +142,11 @@
Dprintk("Bus #%d is %s\n", m->mpc_busid, str);
if (strncmp(str, "ISA", 3) == 0) {
- mp_bus_id_to_type[m->mpc_busid] = MP_BUS_ISA;
- } else if (strncmp(str, "EISA", 4) == 0) {
- mp_bus_id_to_type[m->mpc_busid] = MP_BUS_EISA;
+ set_bit(m->mpc_busid, mp_bus_not_pci);
} else if (strncmp(str, "PCI", 3) == 0) {
- mp_bus_id_to_type[m->mpc_busid] = MP_BUS_PCI;
+ clear_bit(m->mpc_busid, mp_bus_not_pci);
mp_bus_id_to_pci_bus[m->mpc_busid] = mp_current_pci_id;
mp_current_pci_id++;
- } else if (strncmp(str, "MCA", 3) == 0) {
- mp_bus_id_to_type[m->mpc_busid] = MP_BUS_MCA;
} else {
printk(KERN_ERR "Unknown bustype %s\n", str);
}
@@ -197,8 +157,8 @@
if (!(m->mpc_flags & MPC_APIC_USABLE))
return;
- printk("I/O APIC #%d Version %d at 0x%X.\n",
- m->mpc_apicid, m->mpc_apicver, m->mpc_apicaddr);
+ printk("I/O APIC #%d at 0x%X.\n",
+ m->mpc_apicid, m->mpc_apicaddr);
if (nr_ioapics >= MAX_IO_APICS) {
printk(KERN_ERR "Max # of I/O APICs (%d) exceeded (found %d).\n",
MAX_IO_APICS, nr_ioapics);
@@ -232,19 +192,6 @@
m->mpc_irqtype, m->mpc_irqflag & 3,
(m->mpc_irqflag >> 2) &3, m->mpc_srcbusid,
m->mpc_srcbusirq, m->mpc_destapic, m->mpc_destapiclint);
- /*
- * Well it seems all SMP boards in existence
- * use ExtINT/LVT1 == LINT0 and
- * NMI/LVT2 == LINT1 - the following check
- * will show us if this assumptions is false.
- * Until then we do not have to add baggage.
- */
- if ((m->mpc_irqtype == mp_ExtINT) &&
- (m->mpc_destapiclint != 0))
- BUG();
- if ((m->mpc_irqtype == mp_NMI) &&
- (m->mpc_destapiclint != 1))
- BUG();
}
/*
@@ -258,7 +205,7 @@
unsigned char *mpt=((unsigned char *)mpc)+count;
if (memcmp(mpc->mpc_signature,MPC_SIGNATURE,4)) {
- printk("SMP mptable: bad signature [%c%c%c%c]!\n",
+ printk("MPTABLE: bad signature [%c%c%c%c]!\n",
mpc->mpc_signature[0],
mpc->mpc_signature[1],
mpc->mpc_signature[2],
@@ -266,31 +213,31 @@
return 0;
}
if (mpf_checksum((unsigned char *)mpc,mpc->mpc_length)) {
- printk("SMP mptable: checksum error!\n");
+ printk("MPTABLE: checksum error!\n");
return 0;
}
if (mpc->mpc_spec!=0x01 && mpc->mpc_spec!=0x04) {
- printk(KERN_ERR "SMP mptable: bad table version (%d)!!\n",
+ printk(KERN_ERR "MPTABLE: bad table version (%d)!!\n",
mpc->mpc_spec);
return 0;
}
if (!mpc->mpc_lapic) {
- printk(KERN_ERR "SMP mptable: null local APIC address!\n");
+ printk(KERN_ERR "MPTABLE: null local APIC address!\n");
return 0;
}
memcpy(str,mpc->mpc_oem,8);
- str[8]=0;
- printk(KERN_INFO "OEM ID: %s ",str);
+ str[8] = 0;
+ printk(KERN_INFO "MPTABLE: OEM ID: %s ",str);
memcpy(str,mpc->mpc_productid,12);
- str[12]=0;
- printk("Product ID: %s ",str);
+ str[12] = 0;
+ printk("MPTABLE: Product ID: %s ",str);
- printk("APIC at: 0x%X\n",mpc->mpc_lapic);
+ printk("MPTABLE: APIC at: 0x%X\n",mpc->mpc_lapic);
/* save the local APIC address, it might be non-default */
if (!acpi_lapic)
- mp_lapic_addr = mpc->mpc_lapic;
+ mp_lapic_addr = mpc->mpc_lapic;
/*
* Now process the configuration blocks.
@@ -302,7 +249,7 @@
struct mpc_config_processor *m=
(struct mpc_config_processor *)mpt;
if (!acpi_lapic)
- MP_processor_info(m);
+ MP_processor_info(m);
mpt += sizeof(*m);
count += sizeof(*m);
break;
@@ -321,8 +268,8 @@
struct mpc_config_ioapic *m=
(struct mpc_config_ioapic *)mpt;
MP_ioapic_info(m);
- mpt+=sizeof(*m);
- count+=sizeof(*m);
+ mpt += sizeof(*m);
+ count += sizeof(*m);
break;
}
case MP_INTSRC:
@@ -331,8 +278,8 @@
(struct mpc_config_intsrc *)mpt;
MP_intsrc_info(m);
- mpt+=sizeof(*m);
- count+=sizeof(*m);
+ mpt += sizeof(*m);
+ count += sizeof(*m);
break;
}
case MP_LINTSRC:
@@ -340,15 +287,15 @@
struct mpc_config_lintsrc *m=
(struct mpc_config_lintsrc *)mpt;
MP_lintsrc_info(m);
- mpt+=sizeof(*m);
- count+=sizeof(*m);
+ mpt += sizeof(*m);
+ count += sizeof(*m);
break;
}
}
}
clustered_apic_check();
if (!num_processors)
- printk(KERN_ERR "SMP mptable: no processors registered!\n");
+ printk(KERN_ERR "MPTABLE: no processors registered!\n");
return num_processors;
}
@@ -444,13 +391,10 @@
* 2 CPUs, numbered 0 & 1.
*/
processor.mpc_type = MP_PROCESSOR;
- /* Either an integrated APIC or a discrete 82489DX. */
- processor.mpc_apicver = mpc_default_type > 4 ? 0x10 : 0x01;
+ processor.mpc_apicver = 0;
processor.mpc_cpuflag = CPU_ENABLED;
- processor.mpc_cpufeature = (boot_cpu_data.x86 << 8) |
- (boot_cpu_data.x86_model << 4) |
- boot_cpu_data.x86_mask;
- processor.mpc_featureflag = boot_cpu_data.x86_capability[0];
+ processor.mpc_cpufeature = 0;
+ processor.mpc_featureflag = 0;
processor.mpc_reserved[0] = 0;
processor.mpc_reserved[1] = 0;
for (i = 0; i < 2; i++) {
@@ -469,14 +413,6 @@
case 5:
memcpy(bus.mpc_bustype, "ISA ", 6);
break;
- case 2:
- case 6:
- case 3:
- memcpy(bus.mpc_bustype, "EISA ", 6);
- break;
- case 4:
- case 7:
- memcpy(bus.mpc_bustype, "MCA ", 6);
}
MP_bus_info(&bus);
if (mpc_default_type > 4) {
@@ -487,7 +423,7 @@
ioapic.mpc_type = MP_IOAPIC;
ioapic.mpc_apicid = 2;
- ioapic.mpc_apicver = mpc_default_type > 4 ? 0x10 : 0x01;
+ ioapic.mpc_apicver = 0;
ioapic.mpc_flags = MPC_APIC_USABLE;
ioapic.mpc_apicaddr = 0xFEC00000;
MP_ioapic_info(&ioapic);
@@ -530,13 +466,6 @@
printk(KERN_INFO "Using ACPI for processor (LAPIC) configuration information\n");
printk("Intel MultiProcessor Specification v1.%d\n", mpf->mpf_specification);
- if (mpf->mpf_feature2 & (1<<7)) {
- printk(KERN_INFO " IMCR and PIC compatibility mode.\n");
- pic_mode = 1;
- } else {
- printk(KERN_INFO " Virtual Wire compatibility mode.\n");
- pic_mode = 0;
- }
/*
* Now see if we need to read further.
@@ -616,7 +545,7 @@
return 0;
}
-void __init find_intel_smp (void)
+void __init find_smp_config(void)
{
unsigned int address;
@@ -633,9 +562,7 @@
smp_scan_config(0xF0000,0x10000))
return;
/*
- * If it is an SMP machine we should know now, unless the
- * configuration is in an EISA/MCA bus machine with an
- * extended bios data area.
+ * If it is an SMP machine we should know now.
*
* there is a real-mode segmented pointer pointing to the
* 4K EBDA area at 0x40E, calculate and scan it here.
@@ -656,69 +583,41 @@
printk(KERN_INFO "No mptable found.\n");
}
-/*
- * - Intel MP Configuration Table
- */
-void __init find_smp_config (void)
-{
-#ifdef CONFIG_X86_LOCAL_APIC
- find_intel_smp();
-#endif
-}
-
-
/* --------------------------------------------------------------------------
ACPI-based MP Configuration
-------------------------------------------------------------------------- */
#ifdef CONFIG_ACPI
-void __init mp_register_lapic_address (
- u64 address)
+void __init mp_register_lapic_address(u64 address)
{
mp_lapic_addr = (unsigned long) address;
-
set_fixmap_nocache(FIX_APIC_BASE, mp_lapic_addr);
-
if (boot_cpu_id == -1U)
boot_cpu_id = GET_APIC_ID(apic_read(APIC_ID));
-
- Dprintk("Boot CPU = %d\n", boot_cpu_physical_apicid);
}
-
-void __cpuinit mp_register_lapic (
- u8 id,
- u8 enabled)
+void __cpuinit mp_register_lapic (u8 id, u8 enabled)
{
struct mpc_config_processor processor;
int boot_cpu = 0;
- if (id >= MAX_APICS) {
- printk(KERN_WARNING "Processor #%d invalid (max %d)\n",
- id, MAX_APICS);
- return;
- }
-
- if (id == boot_cpu_physical_apicid)
+ if (id == boot_cpu_id)
boot_cpu = 1;
processor.mpc_type = MP_PROCESSOR;
processor.mpc_apicid = id;
- processor.mpc_apicver = GET_APIC_VERSION(apic_read(APIC_LVR));
+ processor.mpc_apicver = 0;
processor.mpc_cpuflag = (enabled ? CPU_ENABLED : 0);
processor.mpc_cpuflag |= (boot_cpu ? CPU_BOOTPROCESSOR : 0);
- processor.mpc_cpufeature = (boot_cpu_data.x86 << 8) |
- (boot_cpu_data.x86_model << 4) | boot_cpu_data.x86_mask;
- processor.mpc_featureflag = boot_cpu_data.x86_capability[0];
+ processor.mpc_cpufeature = 0;
+ processor.mpc_featureflag = 0;
processor.mpc_reserved[0] = 0;
processor.mpc_reserved[1] = 0;
MP_processor_info(&processor);
}
-#ifdef CONFIG_X86_IO_APIC
-
#define MP_ISA_BUS 0
#define MP_MAX_IOAPIC_PIN 127
@@ -729,11 +628,9 @@
u32 pin_programmed[4];
} mp_ioapic_routing[MAX_IO_APICS];
-
-static int mp_find_ioapic (
- int gsi)
+static int mp_find_ioapic(int gsi)
{
- int i = 0;
+ int i = 0;
/* Find the IOAPIC that manages this GSI. */
for (i = 0; i < nr_ioapics; i++) {
@@ -743,17 +640,12 @@
}
printk(KERN_ERR "ERROR: Unable to locate IOAPIC for GSI %d\n", gsi);
-
return -1;
}
-
-void __init mp_register_ioapic (
- u8 id,
- u32 address,
- u32 gsi_base)
+void __init mp_register_ioapic(u8 id, u32 address, u32 gsi_base)
{
- int idx = 0;
+ int idx = 0;
if (nr_ioapics >= MAX_IO_APICS) {
printk(KERN_ERR "ERROR: Max # of I/O APICs (%d) exceeded "
@@ -774,7 +666,7 @@
set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
mp_ioapics[idx].mpc_apicid = id;
- mp_ioapics[idx].mpc_apicver = io_apic_get_version(idx);
+ mp_ioapics[idx].mpc_apicver = 0;
/*
* Build basic IRQ lookup table to facilitate gsi->io_apic lookups
@@ -785,21 +677,15 @@
mp_ioapic_routing[idx].gsi_end = gsi_base +
io_apic_get_redir_entries(idx);
- printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, "
+ printk(KERN_INFO "IOAPIC[%d]: apic_id %d, address 0x%x, "
"GSI %d-%d\n", idx, mp_ioapics[idx].mpc_apicid,
- mp_ioapics[idx].mpc_apicver, mp_ioapics[idx].mpc_apicaddr,
+ mp_ioapics[idx].mpc_apicaddr,
mp_ioapic_routing[idx].gsi_start,
mp_ioapic_routing[idx].gsi_end);
-
- return;
}
-
-void __init mp_override_legacy_irq (
- u8 bus_irq,
- u8 polarity,
- u8 trigger,
- u32 gsi)
+void __init
+mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi)
{
struct mpc_config_intsrc intsrc;
int ioapic = -1;
@@ -837,22 +723,18 @@
mp_irqs[mp_irq_entries] = intsrc;
if (++mp_irq_entries == MAX_IRQ_SOURCES)
panic("Max # of irq sources exceeded!\n");
-
- return;
}
-
-void __init mp_config_acpi_legacy_irqs (void)
+void __init mp_config_acpi_legacy_irqs(void)
{
struct mpc_config_intsrc intsrc;
- int i = 0;
- int ioapic = -1;
+ int i = 0;
+ int ioapic = -1;
/*
* Fabricate the legacy ISA bus (bus #31).
*/
- mp_bus_id_to_type[MP_ISA_BUS] = MP_BUS_ISA;
- Dprintk("Bus #%d is ISA\n", MP_ISA_BUS);
+ set_bit(MP_ISA_BUS, mp_bus_not_pci);
/*
* Locate the IOAPIC that manages the ISA IRQs (0-15).
@@ -905,24 +787,22 @@
if (++mp_irq_entries == MAX_IRQ_SOURCES)
panic("Max # of irq sources exceeded!\n");
}
-
- return;
}
#define MAX_GSI_NUM 4096
int mp_register_gsi(u32 gsi, int triggering, int polarity)
{
- int ioapic = -1;
- int ioapic_pin = 0;
- int idx, bit = 0;
- static int pci_irq = 16;
+ int ioapic = -1;
+ int ioapic_pin = 0;
+ int idx, bit = 0;
+ static int pci_irq = 16;
/*
* Mapping between Global System Interrupts, which
* represent all possible interrupts, to the IRQs
* assigned to actual devices.
*/
- static int gsi_to_irq[MAX_GSI_NUM];
+ static int gsi_to_irq[MAX_GSI_NUM];
if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
return gsi;
@@ -996,6 +876,4 @@
polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
return gsi;
}
-
-#endif /*CONFIG_X86_IO_APIC*/
#endif /*CONFIG_ACPI*/
diff --git a/arch/x86_64/kernel/nmi.c b/arch/x86_64/kernel/nmi.c
index 5baa0c7..4d6fb04 100644
--- a/arch/x86_64/kernel/nmi.c
+++ b/arch/x86_64/kernel/nmi.c
@@ -28,71 +28,138 @@
#include <asm/mce.h>
#include <asm/intel_arch_perfmon.h>
-/*
- * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
- * - it may be reserved by some other driver, or not
- * - when not reserved by some other driver, it may be used for
- * the NMI watchdog, or not
- *
- * This is maintained separately from nmi_active because the NMI
- * watchdog may also be driven from the I/O APIC timer.
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ * different subsystems this reservation system just tries to coordinate
+ * things a little
*/
-static DEFINE_SPINLOCK(lapic_nmi_owner_lock);
-static unsigned int lapic_nmi_owner;
-#define LAPIC_NMI_WATCHDOG (1<<0)
-#define LAPIC_NMI_RESERVED (1<<1)
+static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner);
+static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[2]);
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
/* nmi_active:
- * +1: the lapic NMI watchdog is active, but can be disabled
- * 0: the lapic NMI watchdog has not been set up, and cannot
+ * >0: the lapic NMI watchdog is active, but can be disabled
+ * <0: the lapic NMI watchdog has not been set up, and cannot
* be enabled
- * -1: the lapic NMI watchdog is disabled, but can be enabled
+ * 0: the lapic NMI watchdog is disabled, but can be enabled
*/
-int nmi_active; /* oprofile uses this */
+atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */
int panic_on_timeout;
unsigned int nmi_watchdog = NMI_DEFAULT;
static unsigned int nmi_hz = HZ;
-static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */
-static unsigned int nmi_p4_cccr_val;
-/* Note that these events don't tick when the CPU idles. This means
- the frequency varies with CPU load. */
+struct nmi_watchdog_ctlblk {
+ int enabled;
+ u64 check_bit;
+ unsigned int cccr_msr;
+ unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
+ unsigned int evntsel_msr; /* the MSR to select the events to handle */
+};
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
-#define K7_EVNTSEL_ENABLE (1 << 22)
-#define K7_EVNTSEL_INT (1 << 20)
-#define K7_EVNTSEL_OS (1 << 17)
-#define K7_EVNTSEL_USR (1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
-#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+/* local prototypes */
+static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
-#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+ /* returns the bit offset of the performance counter register */
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ return (msr - MSR_K7_PERFCTR0);
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ return (msr - MSR_ARCH_PERFMON_PERFCTR0);
+ else
+ return (msr - MSR_P4_BPU_PERFCTR0);
+ }
+ return 0;
+}
-#define MSR_P4_MISC_ENABLE 0x1A0
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
-#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL (1<<12)
-#define MSR_P4_PERFCTR0 0x300
-#define MSR_P4_CCCR0 0x360
-#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
-#define P4_ESCR_OS (1<<3)
-#define P4_ESCR_USR (1<<2)
-#define P4_CCCR_OVF_PMI0 (1<<26)
-#define P4_CCCR_OVF_PMI1 (1<<27)
-#define P4_CCCR_THRESHOLD(N) ((N)<<20)
-#define P4_CCCR_COMPLEMENT (1<<19)
-#define P4_CCCR_COMPARE (1<<18)
-#define P4_CCCR_REQUIRED (3<<16)
-#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
-#define P4_CCCR_ENABLE (1<<12)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
- CRU_ESCR0 (with any non-null event selector) through a complemented
- max threshold. [IA32-Vol3, Section 14.9.9] */
-#define MSR_P4_IQ_COUNTER0 0x30C
-#define P4_NMI_CRU_ESCR0 (P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR)
-#define P4_NMI_IQ_CCCR0 \
- (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \
- P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+ /* returns the bit offset of the event selection register */
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ return (msr - MSR_K7_EVNTSEL0);
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
+ else
+ return (msr - MSR_P4_BSU_ESCR0);
+ }
+ return 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner)))
+ return 1;
+ return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner));
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner)))
+ return 1;
+ return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner));
+}
static __cpuinit inline int nmi_known_cpu(void)
{
@@ -109,7 +176,7 @@
}
/* Run after command line and cpu_init init, but before all other checks */
-void __cpuinit nmi_watchdog_default(void)
+void nmi_watchdog_default(void)
{
if (nmi_watchdog != NMI_DEFAULT)
return;
@@ -145,6 +212,12 @@
int *counts;
int cpu;
+ if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
+ return 0;
+
+ if (!atomic_read(&nmi_active))
+ return 0;
+
counts = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
if (!counts)
return -1;
@@ -162,26 +235,43 @@
mdelay((10*1000)/nmi_hz); // wait 10 ticks
for_each_online_cpu(cpu) {
+ if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+ continue;
if (cpu_pda(cpu)->__nmi_count - counts[cpu] <= 5) {
- endflag = 1;
printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu,
counts[cpu],
cpu_pda(cpu)->__nmi_count);
- nmi_active = 0;
- lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG;
- nmi_perfctr_msr = 0;
- kfree(counts);
- return -1;
+ per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+ atomic_dec(&nmi_active);
}
}
+ if (!atomic_read(&nmi_active)) {
+ kfree(counts);
+ atomic_set(&nmi_active, -1);
+ return -1;
+ }
endflag = 1;
printk("OK.\n");
/* now that we know it works we can reduce NMI frequency to
something more reasonable; makes a difference in some configs */
- if (nmi_watchdog == NMI_LOCAL_APIC)
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
nmi_hz = 1;
+ /*
+ * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter
+ * are writable, with higher bits sign extending from bit 31.
+ * So, we can only program the counter with 31 bit values and
+ * 32nd bit should be 1, for 33.. to be 1.
+ * Find the appropriate nmi_hz
+ */
+ if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0 &&
+ ((u64)cpu_khz * 1000) > 0x7fffffffULL) {
+ nmi_hz = ((u64)cpu_khz * 1000) / 0x7fffffffUL + 1;
+ }
+ }
kfree(counts);
return 0;
@@ -201,91 +291,65 @@
get_option(&str, &nmi);
- if (nmi >= NMI_INVALID)
+ if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE))
return 0;
+
+ if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0))
+ return 0; /* no lapic support */
nmi_watchdog = nmi;
return 1;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);
-static void disable_intel_arch_watchdog(void);
-
static void disable_lapic_nmi_watchdog(void)
{
- if (nmi_active <= 0)
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+ if (atomic_read(&nmi_active) <= 0)
return;
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- wrmsr(MSR_K7_EVNTSEL0, 0, 0);
- break;
- case X86_VENDOR_INTEL:
- if (boot_cpu_data.x86 == 15) {
- wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
- wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
- } else if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- disable_intel_arch_watchdog();
- }
- break;
- }
- nmi_active = -1;
- /* tell do_nmi() and others that we're not active any more */
- nmi_watchdog = 0;
+
+ on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+ BUG_ON(atomic_read(&nmi_active) != 0);
}
static void enable_lapic_nmi_watchdog(void)
{
- if (nmi_active < 0) {
- nmi_watchdog = NMI_LOCAL_APIC;
- touch_nmi_watchdog();
- setup_apic_nmi_watchdog();
- }
-}
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-int reserve_lapic_nmi(void)
-{
- unsigned int old_owner;
+ /* are we already enabled */
+ if (atomic_read(&nmi_active) != 0)
+ return;
- spin_lock(&lapic_nmi_owner_lock);
- old_owner = lapic_nmi_owner;
- lapic_nmi_owner |= LAPIC_NMI_RESERVED;
- spin_unlock(&lapic_nmi_owner_lock);
- if (old_owner & LAPIC_NMI_RESERVED)
- return -EBUSY;
- if (old_owner & LAPIC_NMI_WATCHDOG)
- disable_lapic_nmi_watchdog();
- return 0;
-}
+ /* are we lapic aware */
+ if (nmi_known_cpu() <= 0)
+ return;
-void release_lapic_nmi(void)
-{
- unsigned int new_owner;
-
- spin_lock(&lapic_nmi_owner_lock);
- new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
- lapic_nmi_owner = new_owner;
- spin_unlock(&lapic_nmi_owner_lock);
- if (new_owner & LAPIC_NMI_WATCHDOG)
- enable_lapic_nmi_watchdog();
+ on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+ touch_nmi_watchdog();
}
void disable_timer_nmi_watchdog(void)
{
- if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0))
+ BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+ if (atomic_read(&nmi_active) <= 0)
return;
disable_irq(0);
- unset_nmi_callback();
- nmi_active = -1;
- nmi_watchdog = NMI_NONE;
+ on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+ BUG_ON(atomic_read(&nmi_active) != 0);
}
void enable_timer_nmi_watchdog(void)
{
- if (nmi_active < 0) {
- nmi_watchdog = NMI_IO_APIC;
+ BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+ if (atomic_read(&nmi_active) == 0) {
touch_nmi_watchdog();
- nmi_active = 1;
+ on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
enable_irq(0);
}
}
@@ -296,15 +360,20 @@
static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state)
{
- nmi_pm_active = nmi_active;
- disable_lapic_nmi_watchdog();
+ /* only CPU0 goes here, other CPUs should be offline */
+ nmi_pm_active = atomic_read(&nmi_active);
+ stop_apic_nmi_watchdog(NULL);
+ BUG_ON(atomic_read(&nmi_active) != 0);
return 0;
}
static int lapic_nmi_resume(struct sys_device *dev)
{
- if (nmi_pm_active > 0)
- enable_lapic_nmi_watchdog();
+ /* only CPU0 goes here, other CPUs should be offline */
+ if (nmi_pm_active > 0) {
+ setup_apic_nmi_watchdog(NULL);
+ touch_nmi_watchdog();
+ }
return 0;
}
@@ -323,7 +392,13 @@
{
int error;
- if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC)
+ /* should really be a BUG_ON but b/c this is an
+ * init call, it just doesn't work. -dcz
+ */
+ if (nmi_watchdog != NMI_LOCAL_APIC)
+ return 0;
+
+ if ( atomic_read(&nmi_active) < 0 )
return 0;
error = sysdev_class_register(&nmi_sysclass);
@@ -341,74 +416,209 @@
* Original code written by Keith Owens.
*/
-static void clear_msr_range(unsigned int base, unsigned int n)
-{
- unsigned int i;
+/* Note that these events don't tick when the CPU idles. This means
+ the frequency varies with CPU load. */
- for(i = 0; i < n; ++i)
- wrmsr(base+i, 0, 0);
-}
+#define K7_EVNTSEL_ENABLE (1 << 22)
+#define K7_EVNTSEL_INT (1 << 20)
+#define K7_EVNTSEL_OS (1 << 17)
+#define K7_EVNTSEL_USR (1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
+#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
-static void setup_k7_watchdog(void)
+static int setup_k7_watchdog(void)
{
- int i;
+ unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- nmi_perfctr_msr = MSR_K7_PERFCTR0;
+ perfctr_msr = MSR_K7_PERFCTR0;
+ evntsel_msr = MSR_K7_EVNTSEL0;
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
- for(i = 0; i < 4; ++i) {
- /* Simulator may not support it */
- if (checking_wrmsrl(MSR_K7_EVNTSEL0+i, 0UL)) {
- nmi_perfctr_msr = 0;
- return;
- }
- wrmsrl(MSR_K7_PERFCTR0+i, 0UL);
- }
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ /* Simulator may not support it */
+ if (checking_wrmsrl(evntsel_msr, 0UL))
+ goto fail2;
+ wrmsrl(perfctr_msr, 0UL);
evntsel = K7_EVNTSEL_INT
| K7_EVNTSEL_OS
| K7_EVNTSEL_USR
| K7_NMI_EVENT;
- wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
- wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz * 1000 / nmi_hz));
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= K7_EVNTSEL_ENABLE;
- wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd->check_bit = 1ULL<<63;
+ return 1;
+fail2:
+ release_evntsel_nmi(evntsel_msr);
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
-static void disable_intel_arch_watchdog(void)
+static void stop_k7_watchdog(void)
{
- unsigned ebx;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebp indicates event present.
- */
- ebx = cpuid_ebx(10);
- if (!(ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, 0, 0);
+ wrmsr(wd->evntsel_msr, 0, 0);
+
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
}
+/* Note that these events don't tick when the CPU idles. This means
+ the frequency varies with CPU load. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
+#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
+#define P4_ESCR_OS (1<<3)
+#define P4_ESCR_USR (1<<2)
+#define P4_CCCR_OVF_PMI0 (1<<26)
+#define P4_CCCR_OVF_PMI1 (1<<27)
+#define P4_CCCR_THRESHOLD(N) ((N)<<20)
+#define P4_CCCR_COMPLEMENT (1<<19)
+#define P4_CCCR_COMPARE (1<<18)
+#define P4_CCCR_REQUIRED (3<<16)
+#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
+#define P4_CCCR_ENABLE (1<<12)
+#define P4_CCCR_OVF (1<<31)
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+ CRU_ESCR0 (with any non-null event selector) through a complemented
+ max threshold. [IA32-Vol3, Section 14.9.9] */
+
+static int setup_p4_watchdog(void)
+{
+ unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+ unsigned int evntsel, cccr_val;
+ unsigned int misc_enable, dummy;
+ unsigned int ht_num;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
+ if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
+ return 0;
+
+#ifdef CONFIG_SMP
+ /* detect which hyperthread we are on */
+ if (smp_num_siblings == 2) {
+ unsigned int ebx, apicid;
+
+ ebx = cpuid_ebx(1);
+ apicid = (ebx >> 24) & 0xff;
+ ht_num = apicid & 1;
+ } else
+#endif
+ ht_num = 0;
+
+ /* performance counters are shared resources
+ * assign each hyperthread its own set
+ * (re-use the ESCR0 register, seems safe
+ * and keeps the cccr_val the same)
+ */
+ if (!ht_num) {
+ /* logical cpu 0 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR0;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR0;
+ cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+ } else {
+ /* logical cpu 1 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR1;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR1;
+ cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+ }
+
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
+
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+ | P4_ESCR_OS
+ | P4_ESCR_USR;
+
+ cccr_val |= P4_CCCR_THRESHOLD(15)
+ | P4_CCCR_COMPLEMENT
+ | P4_CCCR_COMPARE
+ | P4_CCCR_REQUIRED;
+
+ wrmsr(evntsel_msr, evntsel, 0);
+ wrmsr(cccr_msr, cccr_val, 0);
+ wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ cccr_val |= P4_CCCR_ENABLE;
+ wrmsr(cccr_msr, cccr_val, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = cccr_msr;
+ wd->check_bit = 1ULL<<39;
+ return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
+}
+
+static void stop_p4_watchdog(void)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ wrmsr(wd->cccr_msr, 0, 0);
+ wrmsr(wd->evntsel_msr, 0, 0);
+
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
+}
+
+#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
+#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+
static int setup_intel_arch_watchdog(void)
{
+ unsigned int ebx;
+ union cpuid10_eax eax;
+ unsigned int unused;
+ unsigned int perfctr_msr, evntsel_msr;
unsigned int evntsel;
- unsigned ebx;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
/*
* Check whether the Architectural PerfMon supports
* Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebp indicates event present.
+ * NOTE: Corresponding bit = 0 in ebx indicates event present.
*/
- ebx = cpuid_ebx(10);
- if ((ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- return 0;
+ cpuid(10, &(eax.full), &ebx, &unused, &unused);
+ if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+ (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+ goto fail;
- nmi_perfctr_msr = MSR_ARCH_PERFMON_PERFCTR0;
+ perfctr_msr = MSR_ARCH_PERFMON_PERFCTR0;
+ evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL0;
- clear_msr_range(MSR_ARCH_PERFMON_EVENTSEL0, 2);
- clear_msr_range(MSR_ARCH_PERFMON_PERFCTR0, 2);
+ if (!reserve_perfctr_nmi(perfctr_msr))
+ goto fail;
+
+ if (!reserve_evntsel_nmi(evntsel_msr))
+ goto fail1;
+
+ wrmsrl(perfctr_msr, 0UL);
evntsel = ARCH_PERFMON_EVENTSEL_INT
| ARCH_PERFMON_EVENTSEL_OS
@@ -416,84 +626,122 @@
| ARCH_PERFMON_NMI_EVENT_SEL
| ARCH_PERFMON_NMI_EVENT_UMASK;
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, evntsel, 0);
- wrmsrl(MSR_ARCH_PERFMON_PERFCTR0, -((u64)cpu_khz * 1000 / nmi_hz));
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+
apic_write(APIC_LVTPC, APIC_DM_NMI);
evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
- wrmsr(MSR_ARCH_PERFMON_EVENTSEL0, evntsel, 0);
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd->check_bit = 1ULL << (eax.split.bit_width - 1);
return 1;
+fail1:
+ release_perfctr_nmi(perfctr_msr);
+fail:
+ return 0;
}
-
-static int setup_p4_watchdog(void)
+static void stop_intel_arch_watchdog(void)
{
- unsigned int misc_enable, dummy;
+ unsigned int ebx;
+ union cpuid10_eax eax;
+ unsigned int unused;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy);
- if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
- return 0;
+ /*
+ * Check whether the Architectural PerfMon supports
+ * Unhalted Core Cycles Event or not.
+ * NOTE: Corresponding bit = 0 in ebx indicates event present.
+ */
+ cpuid(10, &(eax.full), &ebx, &unused, &unused);
+ if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+ (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+ return;
- nmi_perfctr_msr = MSR_P4_IQ_COUNTER0;
- nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
-#ifdef CONFIG_SMP
- if (smp_num_siblings == 2)
- nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
-#endif
+ wrmsr(wd->evntsel_msr, 0, 0);
- if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL))
- clear_msr_range(0x3F1, 2);
- /* MSR 0x3F0 seems to have a default value of 0xFC00, but current
- docs doesn't fully define it, so leave it alone for now. */
- if (boot_cpu_data.x86_model >= 0x3) {
- /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
- clear_msr_range(0x3A0, 26);
- clear_msr_range(0x3BC, 3);
- } else {
- clear_msr_range(0x3A0, 31);
- }
- clear_msr_range(0x3C0, 6);
- clear_msr_range(0x3C8, 6);
- clear_msr_range(0x3E0, 2);
- clear_msr_range(MSR_P4_CCCR0, 18);
- clear_msr_range(MSR_P4_PERFCTR0, 18);
-
- wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
- wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
- Dprintk("setting P4_IQ_COUNTER0 to 0x%08lx\n", -(cpu_khz * 1000UL / nmi_hz));
- wrmsrl(MSR_P4_IQ_COUNTER0, -((u64)cpu_khz * 1000 / nmi_hz));
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
- return 1;
+ release_evntsel_nmi(wd->evntsel_msr);
+ release_perfctr_nmi(wd->perfctr_msr);
}
-void setup_apic_nmi_watchdog(void)
+void setup_apic_nmi_watchdog(void *unused)
{
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- if (boot_cpu_data.x86 != 15)
- return;
- if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
- return;
- setup_k7_watchdog();
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- if (!setup_intel_arch_watchdog())
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /* only support LOCAL and IO APICs for now */
+ if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+ (nmi_watchdog != NMI_IO_APIC))
+ return;
+
+ if (wd->enabled == 1)
+ return;
+
+ /* cheap hack to support suspend/resume */
+ /* if cpu0 is not active neither should the other cpus */
+ if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
+ return;
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
return;
- } else if (boot_cpu_data.x86 == 15) {
+ if (!setup_k7_watchdog())
+ return;
+ break;
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ if (!setup_intel_arch_watchdog())
+ return;
+ break;
+ }
if (!setup_p4_watchdog())
return;
- } else {
+ break;
+ default:
return;
}
-
- break;
-
- default:
- return;
}
- lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
- nmi_active = 1;
+ wd->enabled = 1;
+ atomic_inc(&nmi_active);
+}
+
+void stop_apic_nmi_watchdog(void *unused)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /* only support LOCAL and IO APICs for now */
+ if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+ (nmi_watchdog != NMI_IO_APIC))
+ return;
+
+ if (wd->enabled == 0)
+ return;
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
+ return;
+ stop_k7_watchdog();
+ break;
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ stop_intel_arch_watchdog();
+ break;
+ }
+ stop_p4_watchdog();
+ break;
+ default:
+ return;
+ }
+ }
+ wd->enabled = 0;
+ atomic_dec(&nmi_active);
}
/*
@@ -526,93 +774,109 @@
touch_softlockup_watchdog();
}
-void __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
{
int sum;
int touched = 0;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+ u64 dummy;
+ int rc=0;
+
+ /* check for other users first */
+ if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+ == NOTIFY_STOP) {
+ rc = 1;
+ touched = 1;
+ }
sum = read_pda(apic_timer_irqs);
if (__get_cpu_var(nmi_touch)) {
__get_cpu_var(nmi_touch) = 0;
touched = 1;
}
+
#ifdef CONFIG_X86_MCE
/* Could check oops_in_progress here too, but it's safer
not too */
if (atomic_read(&mce_entry) > 0)
touched = 1;
#endif
+ /* if the apic timer isn't firing, this cpu isn't doing much */
if (!touched && __get_cpu_var(last_irq_sum) == sum) {
/*
* Ayiee, looks like this CPU is stuck ...
* wait a few IRQs (5 seconds) before doing the oops ...
*/
local_inc(&__get_cpu_var(alert_counter));
- if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz) {
- if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
- == NOTIFY_STOP) {
- local_set(&__get_cpu_var(alert_counter), 0);
- return;
- }
- die_nmi("NMI Watchdog detected LOCKUP on CPU %d\n", regs);
- }
+ if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz)
+ die_nmi("NMI Watchdog detected LOCKUP on CPU %d\n", regs,
+ panic_on_timeout);
} else {
__get_cpu_var(last_irq_sum) = sum;
local_set(&__get_cpu_var(alert_counter), 0);
}
- if (nmi_perfctr_msr) {
- if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) {
- /*
- * P4 quirks:
- * - An overflown perfctr will assert its interrupt
- * until the OVF flag in its CCCR is cleared.
- * - LVTPC is masked on interrupt and must be
- * unmasked by the LVTPC handler.
- */
- wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- } else if (nmi_perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) {
- /*
- * For Intel based architectural perfmon
- * - LVTPC is masked on interrupt and must be
- * unmasked by the LVTPC handler.
+
+ /* see if the nmi watchdog went off */
+ if (wd->enabled) {
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ rdmsrl(wd->perfctr_msr, dummy);
+ if (dummy & wd->check_bit){
+ /* this wasn't a watchdog timer interrupt */
+ goto done;
+ }
+
+ /* only Intel uses the cccr msr */
+ if (wd->cccr_msr != 0) {
+ /*
+ * P4 quirks:
+ * - An overflown perfctr will assert its interrupt
+ * until the OVF flag in its CCCR is cleared.
+ * - LVTPC is masked on interrupt and must be
+ * unmasked by the LVTPC handler.
+ */
+ rdmsrl(wd->cccr_msr, dummy);
+ dummy &= ~P4_CCCR_OVF;
+ wrmsrl(wd->cccr_msr, dummy);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ } else if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR0) {
+ /*
+ * ArchPerfom/Core Duo needs to re-unmask
+ * the apic vector
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ }
+ /* start the cycle over again */
+ wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+ rc = 1;
+ } else if (nmi_watchdog == NMI_IO_APIC) {
+ /* don't know how to accurately check for this.
+ * just assume it was a watchdog timer interrupt
+ * This matches the old behaviour.
*/
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- }
- wrmsrl(nmi_perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+ rc = 1;
+ } else
+ printk(KERN_WARNING "Unknown enabled NMI hardware?!\n");
}
+done:
+ return rc;
}
-static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
- return 0;
-}
-
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
-
asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
{
- int cpu = safe_smp_processor_id();
-
nmi_enter();
add_pda(__nmi_count,1);
- if (!rcu_dereference(nmi_callback)(regs, cpu))
- default_do_nmi(regs);
+ default_do_nmi(regs);
nmi_exit();
}
-void set_nmi_callback(nmi_callback_t callback)
+int do_nmi_callback(struct pt_regs * regs, int cpu)
{
- vmalloc_sync_all();
- rcu_assign_pointer(nmi_callback, callback);
+#ifdef CONFIG_SYSCTL
+ if (unknown_nmi_panic)
+ return unknown_nmi_panic_callback(regs, cpu);
+#endif
+ return 0;
}
-EXPORT_SYMBOL_GPL(set_nmi_callback);
-
-void unset_nmi_callback(void)
-{
- nmi_callback = dummy_nmi_callback;
-}
-EXPORT_SYMBOL_GPL(unset_nmi_callback);
#ifdef CONFIG_SYSCTL
@@ -621,36 +885,42 @@
unsigned char reason = get_nmi_reason();
char buf[64];
- if (!(reason & 0xc0)) {
- sprintf(buf, "NMI received for unknown reason %02x\n", reason);
- die_nmi(buf,regs);
- }
+ sprintf(buf, "NMI received for unknown reason %02x\n", reason);
+ die_nmi(buf, regs, 1); /* Always panic here */
return 0;
}
/*
- * proc handler for /proc/sys/kernel/unknown_nmi_panic
+ * proc handler for /proc/sys/kernel/nmi
*/
-int proc_unknown_nmi_panic(struct ctl_table *table, int write, struct file *file,
+int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
void __user *buffer, size_t *length, loff_t *ppos)
{
int old_state;
- old_state = unknown_nmi_panic;
+ nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
+ old_state = nmi_watchdog_enabled;
proc_dointvec(table, write, file, buffer, length, ppos);
- if (!!old_state == !!unknown_nmi_panic)
+ if (!!old_state == !!nmi_watchdog_enabled)
return 0;
- if (unknown_nmi_panic) {
- if (reserve_lapic_nmi() < 0) {
- unknown_nmi_panic = 0;
- return -EBUSY;
- } else {
- set_nmi_callback(unknown_nmi_panic_callback);
- }
+ if (atomic_read(&nmi_active) < 0) {
+ printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
+ return -EIO;
+ }
+
+ /* if nmi_watchdog is not set yet, then set it */
+ nmi_watchdog_default();
+
+ if (nmi_watchdog == NMI_LOCAL_APIC) {
+ if (nmi_watchdog_enabled)
+ enable_lapic_nmi_watchdog();
+ else
+ disable_lapic_nmi_watchdog();
} else {
- release_lapic_nmi();
- unset_nmi_callback();
+ printk( KERN_WARNING
+ "NMI watchdog doesn't know what hardware to touch\n");
+ return -EIO;
}
return 0;
}
@@ -659,8 +929,12 @@
EXPORT_SYMBOL(nmi_active);
EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(reserve_lapic_nmi);
-EXPORT_SYMBOL(release_lapic_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
EXPORT_SYMBOL(disable_timer_nmi_watchdog);
EXPORT_SYMBOL(enable_timer_nmi_watchdog);
EXPORT_SYMBOL(touch_nmi_watchdog);
diff --git a/arch/x86_64/kernel/pci-calgary.c b/arch/x86_64/kernel/pci-calgary.c
index 146924b..cfb09b0 100644
--- a/arch/x86_64/kernel/pci-calgary.c
+++ b/arch/x86_64/kernel/pci-calgary.c
@@ -86,7 +86,8 @@
#define MAX_NUM_OF_PHBS 8 /* how many PHBs in total? */
#define MAX_NUM_CHASSIS 8 /* max number of chassis */
-#define MAX_PHB_BUS_NUM (MAX_NUM_OF_PHBS * MAX_NUM_CHASSIS * 2) /* max dev->bus->number */
+/* MAX_PHB_BUS_NUM is the maximal possible dev->bus->number */
+#define MAX_PHB_BUS_NUM (MAX_NUM_OF_PHBS * MAX_NUM_CHASSIS * 2)
#define PHBS_PER_CALGARY 4
/* register offsets in Calgary's internal register space */
@@ -111,31 +112,49 @@
0xB000 /* PHB3 */
};
-static char bus_to_phb[MAX_PHB_BUS_NUM];
-void* tce_table_kva[MAX_PHB_BUS_NUM];
unsigned int specified_table_size = TCE_TABLE_SIZE_UNSPECIFIED;
static int translate_empty_slots __read_mostly = 0;
static int calgary_detected __read_mostly = 0;
-/*
- * the bitmap of PHBs the user requested that we disable
- * translation on.
- */
-static DECLARE_BITMAP(translation_disabled, MAX_PHB_BUS_NUM);
+struct calgary_bus_info {
+ void *tce_space;
+ unsigned char translation_disabled;
+ signed char phbid;
+};
+
+static struct calgary_bus_info bus_info[MAX_PHB_BUS_NUM] = { { NULL, 0, 0 }, };
static void tce_cache_blast(struct iommu_table *tbl);
/* enable this to stress test the chip's TCE cache */
#ifdef CONFIG_IOMMU_DEBUG
-static inline void tce_cache_blast_stress(struct iommu_table *tbl)
+int debugging __read_mostly = 1;
+
+static inline unsigned long verify_bit_range(unsigned long* bitmap,
+ int expected, unsigned long start, unsigned long end)
{
- tce_cache_blast(tbl);
+ unsigned long idx = start;
+
+ BUG_ON(start >= end);
+
+ while (idx < end) {
+ if (!!test_bit(idx, bitmap) != expected)
+ return idx;
+ ++idx;
+ }
+
+ /* all bits have the expected value */
+ return ~0UL;
}
-#else
-static inline void tce_cache_blast_stress(struct iommu_table *tbl)
+#else /* debugging is disabled */
+int debugging __read_mostly = 0;
+
+static inline unsigned long verify_bit_range(unsigned long* bitmap,
+ int expected, unsigned long start, unsigned long end)
{
+ return ~0UL;
}
-#endif /* BLAST_TCE_CACHE_ON_UNMAP */
+#endif /* CONFIG_IOMMU_DEBUG */
static inline unsigned int num_dma_pages(unsigned long dma, unsigned int dmalen)
{
@@ -149,7 +168,7 @@
static inline int translate_phb(struct pci_dev* dev)
{
- int disabled = test_bit(dev->bus->number, translation_disabled);
+ int disabled = bus_info[dev->bus->number].translation_disabled;
return !disabled;
}
@@ -158,6 +177,7 @@
{
unsigned long index;
unsigned long end;
+ unsigned long badbit;
index = start_addr >> PAGE_SHIFT;
@@ -169,14 +189,15 @@
if (end > tbl->it_size) /* don't go off the table */
end = tbl->it_size;
- while (index < end) {
- if (test_bit(index, tbl->it_map))
+ badbit = verify_bit_range(tbl->it_map, 0, index, end);
+ if (badbit != ~0UL) {
+ if (printk_ratelimit())
printk(KERN_ERR "Calgary: entry already allocated at "
"0x%lx tbl %p dma 0x%lx npages %u\n",
- index, tbl, start_addr, npages);
- ++index;
+ badbit, tbl, start_addr, npages);
}
- set_bit_string(tbl->it_map, start_addr >> PAGE_SHIFT, npages);
+
+ set_bit_string(tbl->it_map, index, npages);
}
static unsigned long iommu_range_alloc(struct iommu_table *tbl,
@@ -243,7 +264,7 @@
unsigned int npages)
{
unsigned long entry;
- unsigned long i;
+ unsigned long badbit;
entry = dma_addr >> PAGE_SHIFT;
@@ -251,16 +272,15 @@
tce_free(tbl, entry, npages);
- for (i = 0; i < npages; ++i) {
- if (!test_bit(entry + i, tbl->it_map))
+ badbit = verify_bit_range(tbl->it_map, 1, entry, entry + npages);
+ if (badbit != ~0UL) {
+ if (printk_ratelimit())
printk(KERN_ERR "Calgary: bit is off at 0x%lx "
"tbl %p dma 0x%Lx entry 0x%lx npages %u\n",
- entry + i, tbl, dma_addr, entry, npages);
+ badbit, tbl, dma_addr, entry, npages);
}
__clear_bit_string(tbl->it_map, entry, npages);
-
- tce_cache_blast_stress(tbl);
}
static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
@@ -454,7 +474,7 @@
static inline int busno_to_phbid(unsigned char num)
{
- return bus_to_phb[num];
+ return bus_info[num].phbid;
}
static inline unsigned long split_queue_offset(unsigned char num)
@@ -631,6 +651,10 @@
if (ret)
return ret;
+ tbl = dev->sysdata;
+ tbl->it_base = (unsigned long)bus_info[dev->bus->number].tce_space;
+ tce_free(tbl, 0, tbl->it_size);
+
calgary_reserve_regions(dev);
/* set TARs for each PHB */
@@ -654,11 +678,12 @@
return 0;
}
-static void __init calgary_free_tar(struct pci_dev *dev)
+static void __init calgary_free_bus(struct pci_dev *dev)
{
u64 val64;
struct iommu_table *tbl = dev->sysdata;
void __iomem *target;
+ unsigned int bitmapsz;
target = calgary_reg(tbl->bbar, tar_offset(dev->bus->number));
val64 = be64_to_cpu(readq(target));
@@ -666,8 +691,15 @@
writeq(cpu_to_be64(val64), target);
readq(target); /* flush */
+ bitmapsz = tbl->it_size / BITS_PER_BYTE;
+ free_pages((unsigned long)tbl->it_map, get_order(bitmapsz));
+ tbl->it_map = NULL;
+
kfree(tbl);
dev->sysdata = NULL;
+
+ /* Can't free bootmem allocated memory after system is up :-( */
+ bus_info[dev->bus->number].tce_space = NULL;
}
static void calgary_watchdog(unsigned long data)
@@ -772,12 +804,11 @@
return address;
}
-static int __init calgary_init_one_nontraslated(struct pci_dev *dev)
+static void __init calgary_init_one_nontraslated(struct pci_dev *dev)
{
+ pci_dev_get(dev);
dev->sysdata = NULL;
dev->bus->self = dev;
-
- return 0;
}
static int __init calgary_init_one(struct pci_dev *dev)
@@ -798,6 +829,7 @@
if (ret)
goto iounmap;
+ pci_dev_get(dev);
dev->bus->self = dev;
calgary_enable_translation(dev);
@@ -824,10 +856,9 @@
calgary_init_one_nontraslated(dev);
continue;
}
- if (!tce_table_kva[dev->bus->number] && !translate_empty_slots) {
- pci_dev_put(dev);
+ if (!bus_info[dev->bus->number].tce_space && !translate_empty_slots)
continue;
- }
+
ret = calgary_init_one(dev);
if (ret)
goto error;
@@ -840,15 +871,18 @@
dev = pci_find_device_reverse(PCI_VENDOR_ID_IBM,
PCI_DEVICE_ID_IBM_CALGARY,
dev);
+ if (!dev)
+ break;
if (!translate_phb(dev)) {
pci_dev_put(dev);
continue;
}
- if (!tce_table_kva[dev->bus->number] && !translate_empty_slots)
+ if (!bus_info[dev->bus->number].tce_space && !translate_empty_slots)
continue;
+
calgary_disable_translation(dev);
- calgary_free_tar(dev);
- pci_dev_put(dev);
+ calgary_free_bus(dev);
+ pci_dev_put(dev); /* Undo calgary_init_one()'s pci_dev_get() */
}
return ret;
@@ -890,13 +924,15 @@
if (swiotlb || no_iommu || iommu_detected)
return;
+ if (!early_pci_allowed())
+ return;
+
specified_table_size = determine_tce_table_size(end_pfn * PAGE_SIZE);
for (bus = 0; bus < MAX_PHB_BUS_NUM; bus++) {
int dev;
-
- tce_table_kva[bus] = NULL;
- bus_to_phb[bus] = -1;
+ struct calgary_bus_info *info = &bus_info[bus];
+ info->phbid = -1;
if (read_pci_config(bus, 0, 0, 0) != PCI_VENDOR_DEVICE_ID_CALGARY)
continue;
@@ -907,12 +943,9 @@
*/
phb = (phb + 1) % PHBS_PER_CALGARY;
- if (test_bit(bus, translation_disabled)) {
- printk(KERN_INFO "Calgary: translation is disabled for "
- "PHB 0x%x\n", bus);
- /* skip this phb, don't allocate a tbl for it */
+ if (info->translation_disabled)
continue;
- }
+
/*
* Scan the slots of the PCI bus to see if there is a device present.
* The parent bus will be the zero-ith device, so start at 1.
@@ -923,8 +956,8 @@
tbl = alloc_tce_table();
if (!tbl)
goto cleanup;
- tce_table_kva[bus] = tbl;
- bus_to_phb[bus] = phb;
+ info->tce_space = tbl;
+ info->phbid = phb;
calgary_found = 1;
break;
}
@@ -934,15 +967,20 @@
if (calgary_found) {
iommu_detected = 1;
calgary_detected = 1;
- printk(KERN_INFO "PCI-DMA: Calgary IOMMU detected. "
- "TCE table spec is %d.\n", specified_table_size);
+ printk(KERN_INFO "PCI-DMA: Calgary IOMMU detected.\n");
+ printk(KERN_INFO "PCI-DMA: Calgary TCE table spec is %d, "
+ "CONFIG_IOMMU_DEBUG is %s.\n", specified_table_size,
+ debugging ? "enabled" : "disabled");
}
return;
cleanup:
- for (--bus; bus >= 0; --bus)
- if (tce_table_kva[bus])
- free_tce_table(tce_table_kva[bus]);
+ for (--bus; bus >= 0; --bus) {
+ struct calgary_bus_info *info = &bus_info[bus];
+
+ if (info->tce_space)
+ free_tce_table(info->tce_space);
+ }
}
int __init calgary_iommu_init(void)
@@ -1016,7 +1054,7 @@
if (bridge < MAX_PHB_BUS_NUM) {
printk(KERN_INFO "Calgary: disabling "
"translation for PHB 0x%x\n", bridge);
- set_bit(bridge, translation_disabled);
+ bus_info[bridge].translation_disabled = 1;
}
}
diff --git a/arch/x86_64/kernel/pci-dma.c b/arch/x86_64/kernel/pci-dma.c
index 9c44f4f..4dcb671 100644
--- a/arch/x86_64/kernel/pci-dma.c
+++ b/arch/x86_64/kernel/pci-dma.c
@@ -236,6 +236,9 @@
{
iommu_merge = 1;
+ if (!p)
+ return -EINVAL;
+
while (*p) {
if (!strncmp(p,"off",3))
no_iommu = 1;
@@ -278,9 +281,9 @@
if (*p == ',')
++p;
}
- return 1;
+ return 0;
}
-__setup("iommu=", iommu_setup);
+early_param("iommu", iommu_setup);
void __init pci_iommu_alloc(void)
{
diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 6d3e61b..16261a8 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -239,8 +239,6 @@
{
unsigned long phys_mem, bus;
- BUG_ON(dir == DMA_NONE);
-
if (!dev)
dev = &fallback_dev;
@@ -383,7 +381,6 @@
unsigned long pages = 0;
int need = 0, nextneed;
- BUG_ON(dir == DMA_NONE);
if (nents == 0)
return 0;
diff --git a/arch/x86_64/kernel/pci-nommu.c b/arch/x86_64/kernel/pci-nommu.c
index aad7609..df09ab0 100644
--- a/arch/x86_64/kernel/pci-nommu.c
+++ b/arch/x86_64/kernel/pci-nommu.c
@@ -59,7 +59,6 @@
{
int i;
- BUG_ON(direction == DMA_NONE);
for (i = 0; i < nents; i++ ) {
struct scatterlist *s = &sg[i];
BUG_ON(!s->page);
diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index bb6745d..458006a 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -80,25 +80,25 @@
}
EXPORT_SYMBOL(idle_notifier_unregister);
-enum idle_state { CPU_IDLE, CPU_NOT_IDLE };
-static DEFINE_PER_CPU(enum idle_state, idle_state) = CPU_NOT_IDLE;
-
void enter_idle(void)
{
- __get_cpu_var(idle_state) = CPU_IDLE;
+ write_pda(isidle, 1);
atomic_notifier_call_chain(&idle_notifier, IDLE_START, NULL);
}
static void __exit_idle(void)
{
- __get_cpu_var(idle_state) = CPU_NOT_IDLE;
+ if (read_pda(isidle) == 0)
+ return;
+ write_pda(isidle, 0);
atomic_notifier_call_chain(&idle_notifier, IDLE_END, NULL);
}
/* Called from interrupts to signify idle end */
void exit_idle(void)
{
- if (current->pid | read_pda(irqcount))
+ /* idle loop has pid 0 */
+ if (current->pid)
return;
__exit_idle();
}
@@ -220,6 +220,9 @@
play_dead();
enter_idle();
idle();
+ /* In many cases the interrupt that ended idle
+ has already called exit_idle. But some idle
+ loops can be woken up without interrupt. */
__exit_idle();
}
@@ -350,6 +353,7 @@
kfree(t->io_bitmap_ptr);
t->io_bitmap_ptr = NULL;
+ clear_thread_flag(TIF_IO_BITMAP);
/*
* Careful, clear this in the TSS too:
*/
@@ -369,6 +373,7 @@
if (t->flags & _TIF_IA32)
current_thread_info()->status |= TS_COMPAT;
}
+ t->flags &= ~_TIF_DEBUG;
tsk->thread.debugreg0 = 0;
tsk->thread.debugreg1 = 0;
@@ -461,7 +466,7 @@
asm("mov %%es,%0" : "=m" (p->thread.es));
asm("mov %%ds,%0" : "=m" (p->thread.ds));
- if (unlikely(me->thread.io_bitmap_ptr != NULL)) {
+ if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr) {
p->thread.io_bitmap_max = 0;
@@ -469,6 +474,7 @@
}
memcpy(p->thread.io_bitmap_ptr, me->thread.io_bitmap_ptr,
IO_BITMAP_BYTES);
+ set_tsk_thread_flag(p, TIF_IO_BITMAP);
}
/*
@@ -498,6 +504,40 @@
*/
#define loaddebug(thread,r) set_debugreg(thread->debugreg ## r, r)
+static inline void __switch_to_xtra(struct task_struct *prev_p,
+ struct task_struct *next_p,
+ struct tss_struct *tss)
+{
+ struct thread_struct *prev, *next;
+
+ prev = &prev_p->thread,
+ next = &next_p->thread;
+
+ if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
+ loaddebug(next, 0);
+ loaddebug(next, 1);
+ loaddebug(next, 2);
+ loaddebug(next, 3);
+ /* no 4 and 5 */
+ loaddebug(next, 6);
+ loaddebug(next, 7);
+ }
+
+ if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+ /*
+ * Copy the relevant range of the IO bitmap.
+ * Normally this is 128 bytes or less:
+ */
+ memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+ max(prev->io_bitmap_max, next->io_bitmap_max));
+ } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
+ /*
+ * Clear any possible leftover bits:
+ */
+ memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+ }
+}
+
/*
* switch_to(x,y) should switch tasks from x to y.
*
@@ -515,6 +555,10 @@
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
+ /* we're going to use this soon, after a few expensive things */
+ if (next_p->fpu_counter>5)
+ prefetch(&next->i387.fxsave);
+
/*
* Reload esp0, LDT and the page table pointer:
*/
@@ -583,41 +627,29 @@
And the AMD workaround requires it to be after DS reload. */
unlazy_fpu(prev_p);
write_pda(kernelstack,
- task_stack_page(next_p) + THREAD_SIZE - PDA_STACKOFFSET);
+ (unsigned long)task_stack_page(next_p) + THREAD_SIZE - PDA_STACKOFFSET);
+#ifdef CONFIG_CC_STACKPROTECTOR
+ write_pda(stack_canary, next_p->stack_canary);
+ /*
+ * Build time only check to make sure the stack_canary is at
+ * offset 40 in the pda; this is a gcc ABI requirement
+ */
+ BUILD_BUG_ON(offsetof(struct x8664_pda, stack_canary) != 40);
+#endif
/*
- * Now maybe reload the debug registers
+ * Now maybe reload the debug registers and handle I/O bitmaps
*/
- if (unlikely(next->debugreg7)) {
- loaddebug(next, 0);
- loaddebug(next, 1);
- loaddebug(next, 2);
- loaddebug(next, 3);
- /* no 4 and 5 */
- loaddebug(next, 6);
- loaddebug(next, 7);
- }
+ if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW))
+ || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP))
+ __switch_to_xtra(prev_p, next_p, tss);
-
- /*
- * Handle the IO bitmap
- */
- if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
- if (next->io_bitmap_ptr)
- /*
- * Copy the relevant range of the IO bitmap.
- * Normally this is 128 bytes or less:
- */
- memcpy(tss->io_bitmap, next->io_bitmap_ptr,
- max(prev->io_bitmap_max, next->io_bitmap_max));
- else {
- /*
- * Clear any possible leftover bits:
- */
- memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
- }
- }
-
+ /* If the task has used fpu the last 5 timeslices, just do a full
+ * restore of the math state immediately to avoid the trap; the
+ * chances of needing FPU soon are obviously high now
+ */
+ if (next_p->fpu_counter>5)
+ math_state_restore();
return prev_p;
}
@@ -834,7 +866,7 @@
unsigned long arch_align_stack(unsigned long sp)
{
- if (randomize_va_space)
+ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
sp -= get_random_int() % 8192;
return sp & ~0xf;
}
diff --git a/arch/x86_64/kernel/ptrace.c b/arch/x86_64/kernel/ptrace.c
index 2d50024..addc14a 100644
--- a/arch/x86_64/kernel/ptrace.c
+++ b/arch/x86_64/kernel/ptrace.c
@@ -116,17 +116,17 @@
return addr;
}
-static int is_at_popf(struct task_struct *child, struct pt_regs *regs)
+static int is_setting_trap_flag(struct task_struct *child, struct pt_regs *regs)
{
int i, copied;
- unsigned char opcode[16];
+ unsigned char opcode[15];
unsigned long addr = convert_rip_to_linear(child, regs);
copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0);
for (i = 0; i < copied; i++) {
switch (opcode[i]) {
- /* popf */
- case 0x9d:
+ /* popf and iret */
+ case 0x9d: case 0xcf:
return 1;
/* CHECKME: 64 65 */
@@ -138,14 +138,17 @@
case 0x26: case 0x2e:
case 0x36: case 0x3e:
case 0x64: case 0x65:
- case 0xf0: case 0xf2: case 0xf3:
+ case 0xf2: case 0xf3:
continue;
- /* REX prefixes */
case 0x40 ... 0x4f:
+ if (regs->cs != __USER_CS)
+ /* 32-bit mode: register increment */
+ return 0;
+ /* 64-bit mode: REX prefix */
continue;
- /* CHECKME: f0, f2, f3 */
+ /* CHECKME: f2, f3 */
/*
* pushf: NOTE! We should probably not let
@@ -186,10 +189,8 @@
* ..but if TF is changed by the instruction we will trace,
* don't mark it as being "us" that set it, so that we
* won't clear it by hand later.
- *
- * AK: this is not enough, LAHF and IRET can change TF in user space too.
*/
- if (is_at_popf(child, regs))
+ if (is_setting_trap_flag(child, regs))
return;
child->ptrace |= PT_DTRACE;
@@ -420,9 +421,13 @@
if ((0x5554 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
break;
if (i == 4) {
- child->thread.debugreg7 = data;
+ child->thread.debugreg7 = data;
+ if (data)
+ set_tsk_thread_flag(child, TIF_DEBUG);
+ else
+ clear_tsk_thread_flag(child, TIF_DEBUG);
ret = 0;
- }
+ }
break;
}
break;
diff --git a/arch/x86_64/kernel/relocate_kernel.S b/arch/x86_64/kernel/relocate_kernel.S
index d24fa9b..14e9587 100644
--- a/arch/x86_64/kernel/relocate_kernel.S
+++ b/arch/x86_64/kernel/relocate_kernel.S
@@ -7,31 +7,169 @@
*/
#include <linux/linkage.h>
+#include <asm/page.h>
+#include <asm/kexec.h>
- /*
- * Must be relocatable PIC code callable as a C function, that once
- * it starts can not use the previous processes stack.
- */
- .globl relocate_new_kernel
+/*
+ * Must be relocatable PIC code callable as a C function
+ */
+
+#define PTR(x) (x << 3)
+#define PAGE_ALIGNED (1 << PAGE_SHIFT)
+#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
+
+ .text
+ .align PAGE_ALIGNED
.code64
-relocate_new_kernel:
- /* %rdi page_list
- * %rsi reboot_code_buffer
+ .globl relocate_kernel
+relocate_kernel:
+ /* %rdi indirection_page
+ * %rsi page_list
* %rdx start address
- * %rcx page_table
- * %r8 arg5
- * %r9 arg6
+ */
+
+ /* map the control page at its virtual address */
+
+ movq $0x0000ff8000000000, %r10 /* mask */
+ mov $(39 - 3), %cl /* bits to shift */
+ movq PTR(VA_CONTROL_PAGE)(%rsi), %r11 /* address to map */
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PGD)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PUD_0)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PUD_0)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PMD_0)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PMD_0)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PTE_0)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PTE_0)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ /* identity map the control page at its physical address */
+
+ movq $0x0000ff8000000000, %r10 /* mask */
+ mov $(39 - 3), %cl /* bits to shift */
+ movq PTR(PA_CONTROL_PAGE)(%rsi), %r11 /* address to map */
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PGD)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PUD_1)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PUD_1)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PMD_1)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PMD_1)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_PTE_1)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+ shrq $9, %r10
+ sub $9, %cl
+
+ movq %r11, %r9
+ andq %r10, %r9
+ shrq %cl, %r9
+
+ movq PTR(VA_PTE_1)(%rsi), %r8
+ addq %r8, %r9
+ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8
+ orq $PAGE_ATTR, %r8
+ movq %r8, (%r9)
+
+relocate_new_kernel:
+ /* %rdi indirection_page
+ * %rsi page_list
+ * %rdx start address
*/
/* zero out flags, and disable interrupts */
pushq $0
popfq
- /* set a new stack at the bottom of our page... */
- lea 4096(%rsi), %rsp
+ /* get physical address of control page now */
+ /* this is impossible after page table switch */
+ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8
- /* store the parameters back on the stack */
- pushq %rdx /* store the start address */
+ /* get physical address of page table now too */
+ movq PTR(PA_TABLE_PAGE)(%rsi), %rcx
+
+ /* switch to new set of page tables */
+ movq PTR(PA_PGD)(%rsi), %r9
+ movq %r9, %cr3
+
+ /* setup a new stack at the end of the physical control page */
+ lea 4096(%r8), %rsp
+
+ /* jump to identity mapped page */
+ addq $(identity_mapped - relocate_kernel), %r8
+ pushq %r8
+ ret
+
+identity_mapped:
+ /* store the start address on the stack */
+ pushq %rdx
/* Set cr0 to a known state:
* 31 1 == Paging enabled
@@ -136,8 +274,3 @@
xorq %r15, %r15
ret
-relocate_new_kernel_end:
-
- .globl relocate_new_kernel_size
-relocate_new_kernel_size:
- .quad relocate_new_kernel_end - relocate_new_kernel
diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index 4b39f0d..f98e48c 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -74,16 +74,6 @@
unsigned long mmu_cr4_features;
-int acpi_disabled;
-EXPORT_SYMBOL(acpi_disabled);
-#ifdef CONFIG_ACPI
-extern int __initdata acpi_ht;
-extern acpi_interrupt_flags acpi_sci_flags;
-int __initdata acpi_force = 0;
-#endif
-
-int acpi_numa __initdata;
-
/* Boot loader ID as an integer, for the benefit of proc_dointvec */
int bootloader_type;
@@ -107,7 +97,6 @@
struct edid_info edid_info;
EXPORT_SYMBOL_GPL(edid_info);
-struct e820map e820;
extern int root_mountflags;
@@ -276,184 +265,21 @@
}
}
-/* Check for full argument with no trailing characters */
-static int fullarg(char *p, char *arg)
-{
- int l = strlen(arg);
- return !memcmp(p, arg, l) && (p[l] == 0 || isspace(p[l]));
-}
-
-static __init void parse_cmdline_early (char ** cmdline_p)
-{
- char c = ' ', *to = command_line, *from = COMMAND_LINE;
- int len = 0;
- int userdef = 0;
-
- for (;;) {
- if (c != ' ')
- goto next_char;
-
-#ifdef CONFIG_SMP
- /*
- * If the BIOS enumerates physical processors before logical,
- * maxcpus=N at enumeration-time can be used to disable HT.
- */
- else if (!memcmp(from, "maxcpus=", 8)) {
- extern unsigned int maxcpus;
-
- maxcpus = simple_strtoul(from + 8, NULL, 0);
- }
-#endif
-#ifdef CONFIG_ACPI
- /* "acpi=off" disables both ACPI table parsing and interpreter init */
- if (fullarg(from,"acpi=off"))
- disable_acpi();
-
- if (fullarg(from, "acpi=force")) {
- /* add later when we do DMI horrors: */
- acpi_force = 1;
- acpi_disabled = 0;
- }
-
- /* acpi=ht just means: do ACPI MADT parsing
- at bootup, but don't enable the full ACPI interpreter */
- if (fullarg(from, "acpi=ht")) {
- if (!acpi_force)
- disable_acpi();
- acpi_ht = 1;
- }
- else if (fullarg(from, "pci=noacpi"))
- acpi_disable_pci();
- else if (fullarg(from, "acpi=noirq"))
- acpi_noirq_set();
-
- else if (fullarg(from, "acpi_sci=edge"))
- acpi_sci_flags.trigger = 1;
- else if (fullarg(from, "acpi_sci=level"))
- acpi_sci_flags.trigger = 3;
- else if (fullarg(from, "acpi_sci=high"))
- acpi_sci_flags.polarity = 1;
- else if (fullarg(from, "acpi_sci=low"))
- acpi_sci_flags.polarity = 3;
-
- /* acpi=strict disables out-of-spec workarounds */
- else if (fullarg(from, "acpi=strict")) {
- acpi_strict = 1;
- }
-#ifdef CONFIG_X86_IO_APIC
- else if (fullarg(from, "acpi_skip_timer_override"))
- acpi_skip_timer_override = 1;
-#endif
-#endif
-
- if (fullarg(from, "disable_timer_pin_1"))
- disable_timer_pin_1 = 1;
- if (fullarg(from, "enable_timer_pin_1"))
- disable_timer_pin_1 = -1;
-
- if (fullarg(from, "nolapic") || fullarg(from, "disableapic")) {
- clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
- disable_apic = 1;
- }
-
- if (fullarg(from, "noapic"))
- skip_ioapic_setup = 1;
-
- if (fullarg(from,"apic")) {
- skip_ioapic_setup = 0;
- ioapic_force = 1;
- }
-
- if (!memcmp(from, "mem=", 4))
- parse_memopt(from+4, &from);
-
- if (!memcmp(from, "memmap=", 7)) {
- /* exactmap option is for used defined memory */
- if (!memcmp(from+7, "exactmap", 8)) {
-#ifdef CONFIG_CRASH_DUMP
- /* If we are doing a crash dump, we
- * still need to know the real mem
- * size before original memory map is
- * reset.
- */
- saved_max_pfn = e820_end_of_ram();
-#endif
- from += 8+7;
- end_pfn_map = 0;
- e820.nr_map = 0;
- userdef = 1;
- }
- else {
- parse_memmapopt(from+7, &from);
- userdef = 1;
- }
- }
-
-#ifdef CONFIG_NUMA
- if (!memcmp(from, "numa=", 5))
- numa_setup(from+5);
-#endif
-
- if (!memcmp(from,"iommu=",6)) {
- iommu_setup(from+6);
- }
-
- if (fullarg(from,"oops=panic"))
- panic_on_oops = 1;
-
- if (!memcmp(from, "noexec=", 7))
- nonx_setup(from + 7);
-
-#ifdef CONFIG_KEXEC
- /* crashkernel=size@addr specifies the location to reserve for
- * a crash kernel. By reserving this memory we guarantee
- * that linux never set's it up as a DMA target.
- * Useful for holding code to do something appropriate
- * after a kernel panic.
- */
- else if (!memcmp(from, "crashkernel=", 12)) {
- unsigned long size, base;
- size = memparse(from+12, &from);
- if (*from == '@') {
- base = memparse(from+1, &from);
- /* FIXME: Do I want a sanity check
- * to validate the memory range?
- */
- crashk_res.start = base;
- crashk_res.end = base + size - 1;
- }
- }
-#endif
-
#ifdef CONFIG_PROC_VMCORE
- /* elfcorehdr= specifies the location of elf core header
- * stored by the crashed kernel. This option will be passed
- * by kexec loader to the capture kernel.
- */
- else if(!memcmp(from, "elfcorehdr=", 11))
- elfcorehdr_addr = memparse(from+11, &from);
-#endif
-
-#ifdef CONFIG_HOTPLUG_CPU
- else if (!memcmp(from, "additional_cpus=", 16))
- setup_additional_cpus(from+16);
-#endif
-
- next_char:
- c = *(from++);
- if (!c)
- break;
- if (COMMAND_LINE_SIZE <= ++len)
- break;
- *(to++) = c;
- }
- if (userdef) {
- printk(KERN_INFO "user-defined physical RAM map:\n");
- e820_print_map("user");
- }
- *to = '\0';
- *cmdline_p = command_line;
+/* elfcorehdr= specifies the location of elf core header
+ * stored by the crashed kernel. This option will be passed
+ * by kexec loader to the capture kernel.
+ */
+static int __init setup_elfcorehdr(char *arg)
+{
+ char *end;
+ if (!arg)
+ return -EINVAL;
+ elfcorehdr_addr = memparse(arg, &end);
+ return end > arg ? 0 : -EINVAL;
}
+early_param("elfcorehdr", setup_elfcorehdr);
+#endif
#ifndef CONFIG_NUMA
static void __init
@@ -521,6 +347,8 @@
void __init setup_arch(char **cmdline_p)
{
+ printk(KERN_INFO "Command line: %s\n", saved_command_line);
+
ROOT_DEV = old_decode_dev(ORIG_ROOT_DEV);
screen_info = SCREEN_INFO;
edid_info = EDID_INFO;
@@ -547,16 +375,21 @@
data_resource.start = virt_to_phys(&_etext);
data_resource.end = virt_to_phys(&_edata)-1;
- parse_cmdline_early(cmdline_p);
-
early_identify_cpu(&boot_cpu_data);
+ strlcpy(command_line, saved_command_line, COMMAND_LINE_SIZE);
+ *cmdline_p = command_line;
+
+ parse_early_param();
+
+ finish_e820_parsing();
+
/*
* partially used pages are not usable - thus
* we are rounding upwards:
*/
end_pfn = e820_end_of_ram();
- num_physpages = end_pfn; /* for pfn_valid */
+ num_physpages = end_pfn;
check_efer();
@@ -576,6 +409,11 @@
acpi_boot_table_init();
#endif
+ /* How many end-of-memory variables you have, grandma! */
+ max_low_pfn = end_pfn;
+ max_pfn = end_pfn;
+ high_memory = (void *)__va(end_pfn * PAGE_SIZE - 1) + 1;
+
#ifdef CONFIG_ACPI_NUMA
/*
* Parse SRAT to discover nodes.
@@ -625,12 +463,10 @@
*/
acpi_reserve_bootmem();
#endif
-#ifdef CONFIG_X86_LOCAL_APIC
/*
* Find and reserve possible boot-time SMP configuration:
*/
find_smp_config();
-#endif
#ifdef CONFIG_BLK_DEV_INITRD
if (LOADER_TYPE && INITRD_START) {
if (INITRD_START + INITRD_SIZE <= (end_pfn << PAGE_SHIFT)) {
@@ -657,7 +493,9 @@
paging_init();
- check_ioapic();
+#ifdef CONFIG_PCI
+ early_quirks();
+#endif
/*
* set this early, so we dont allocate cpu0
@@ -674,14 +512,12 @@
init_cpu_to_node();
-#ifdef CONFIG_X86_LOCAL_APIC
/*
* get boot-time SMP configuration:
*/
if (smp_found_config)
get_smp_config();
init_apic_mappings();
-#endif
/*
* Request address space for all standard RAM and ROM resources
@@ -839,7 +675,7 @@
#endif
}
-static void __init init_amd(struct cpuinfo_x86 *c)
+static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
unsigned level;
@@ -895,6 +731,12 @@
/* Fix cpuid4 emulation for more */
num_cache_leaves = 3;
+
+ /* When there is only one core no need to synchronize RDTSC */
+ if (num_possible_cpus() == 1)
+ set_bit(X86_FEATURE_SYNC_RDTSC, &c->x86_capability);
+ else
+ clear_bit(X86_FEATURE_SYNC_RDTSC, &c->x86_capability);
}
static void __cpuinit detect_ht(struct cpuinfo_x86 *c)
@@ -976,8 +818,7 @@
node = first_node(node_online_map);
numa_set_node(cpu, node);
- if (acpi_numa > 0)
- printk(KERN_INFO "CPU %d/%x -> Node %d\n", cpu, apicid, node);
+ printk(KERN_INFO "CPU %d/%x -> Node %d\n", cpu, apicid, node);
#endif
}
@@ -1011,6 +852,8 @@
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e))
set_bit(X86_FEATURE_CONSTANT_TSC, &c->x86_capability);
+ if (c->x86 == 6)
+ set_bit(X86_FEATURE_REP_GOOD, &c->x86_capability);
set_bit(X86_FEATURE_SYNC_RDTSC, &c->x86_capability);
c->x86_max_cores = intel_num_cpu_cores(c);
@@ -1229,8 +1072,8 @@
/* Intel-defined (#2) */
"pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est",
- "tm2", NULL, "cid", NULL, NULL, "cx16", "xtpr", NULL,
- NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+ "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL,
+ NULL, NULL, "dca", NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* VIA/Cyrix/Centaur-defined */
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 417de56..8c4b80f 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -24,7 +24,7 @@
#include <asm/proto.h>
#include <asm/sections.h>
-char x86_boot_params[BOOT_PARAM_SIZE] __initdata = {0,};
+char x86_boot_params[BOOT_PARAM_SIZE] __initdata;
cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
@@ -46,8 +46,10 @@
on Enable(default)
off Disable
*/
-int __init nonx_setup(char *str)
+static int __init nonx_setup(char *str)
{
+ if (!str)
+ return -EINVAL;
if (!strncmp(str, "on", 2)) {
__supported_pte_mask |= _PAGE_NX;
do_not_nx = 0;
@@ -55,9 +57,9 @@
do_not_nx = 1;
__supported_pte_mask &= ~_PAGE_NX;
}
- return 1;
+ return 0;
}
-__setup("noexec=", nonx_setup); /* parsed early actually */
+early_param("noexec", nonx_setup);
int force_personality32 = 0;
@@ -93,12 +95,9 @@
#endif
/* Copy section for each CPU (we discard the original) */
- size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
-#ifdef CONFIG_MODULES
- if (size < PERCPU_ENOUGH_ROOM)
- size = PERCPU_ENOUGH_ROOM;
-#endif
+ size = PERCPU_ENOUGH_ROOM;
+ printk(KERN_INFO "PERCPU: Allocating %lu bytes of per cpu data\n", size);
for_each_cpu_mask (i, cpu_possible_map) {
char *ptr;
@@ -122,7 +121,10 @@
/* Setup up data that may be needed in __get_free_pages early */
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0));
+ /* Memory clobbers used to order PDA accessed */
+ mb();
wrmsrl(MSR_GS_BASE, pda);
+ mb();
pda->cpunumber = cpu;
pda->irqcount = -1;
@@ -178,6 +180,8 @@
}
}
+unsigned long kernel_eflags;
+
/*
* cpu_init() initializes state that is per-CPU. Some data is already
* initialized (naturally) in the bootstrap process, such as the GDT
@@ -235,28 +239,17 @@
* set up and load the per-CPU TSS
*/
for (v = 0; v < N_EXCEPTION_STACKS; v++) {
+ static const unsigned int order[N_EXCEPTION_STACKS] = {
+ [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STACK_ORDER,
+ [DEBUG_STACK - 1] = DEBUG_STACK_ORDER
+ };
if (cpu) {
- static const unsigned int order[N_EXCEPTION_STACKS] = {
- [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STACK_ORDER,
- [DEBUG_STACK - 1] = DEBUG_STACK_ORDER
- };
-
estacks = (char *)__get_free_pages(GFP_ATOMIC, order[v]);
if (!estacks)
panic("Cannot allocate exception stack %ld %d\n",
v, cpu);
}
- switch (v + 1) {
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
- case DEBUG_STACK:
- cpu_pda(cpu)->debugstack = (unsigned long)estacks;
- estacks += DEBUG_STKSZ;
- break;
-#endif
- default:
- estacks += EXCEPTION_STKSZ;
- break;
- }
+ estacks += PAGE_SIZE << order[v];
orig_ist->ist[v] = t->ist[v] = (unsigned long)estacks;
}
@@ -290,4 +283,6 @@
set_debugreg(0UL, 7);
fpu_init();
+
+ raw_local_save_flags(kernel_eflags);
}
diff --git a/arch/x86_64/kernel/signal.c b/arch/x86_64/kernel/signal.c
index 2816117..49ec324 100644
--- a/arch/x86_64/kernel/signal.c
+++ b/arch/x86_64/kernel/signal.c
@@ -38,37 +38,6 @@
sigset_t *set, struct pt_regs * regs);
asmlinkage long
-sys_rt_sigsuspend(sigset_t __user *unewset, size_t sigsetsize, struct pt_regs *regs)
-{
- sigset_t saveset, newset;
-
- /* XXX: Don't preclude handling different sized sigset_t's. */
- if (sigsetsize != sizeof(sigset_t))
- return -EINVAL;
-
- if (copy_from_user(&newset, unewset, sizeof(newset)))
- return -EFAULT;
- sigdelsetmask(&newset, ~_BLOCKABLE);
-
- spin_lock_irq(¤t->sighand->siglock);
- saveset = current->blocked;
- current->blocked = newset;
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
-#ifdef DEBUG_SIG
- printk("rt_sigsuspend savset(%lx) newset(%lx) regs(%p) rip(%lx)\n",
- saveset, newset, regs, regs->rip);
-#endif
- regs->rax = -EINTR;
- while (1) {
- current->state = TASK_INTERRUPTIBLE;
- schedule();
- if (do_signal(regs, &saveset))
- return -EINTR;
- }
-}
-
-asmlinkage long
sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
struct pt_regs *regs)
{
@@ -308,11 +277,6 @@
#endif
/* Set up registers for signal handler */
- {
- struct exec_domain *ed = current_thread_info()->exec_domain;
- if (unlikely(ed && ed->signal_invmap && sig < 32))
- sig = ed->signal_invmap[sig];
- }
regs->rdi = sig;
/* In case the signal handler was declared without prototypes */
regs->rax = 0;
@@ -341,11 +305,11 @@
current->comm, current->pid, frame, regs->rip, frame->pretcode);
#endif
- return 1;
+ return 0;
give_sigsegv:
force_sigsegv(sig, current);
- return 0;
+ return -EFAULT;
}
/*
@@ -408,7 +372,7 @@
#endif
ret = setup_rt_frame(sig, ka, info, oldset, regs);
- if (ret) {
+ if (ret == 0) {
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask);
if (!(ka->sa.sa_flags & SA_NODEFER))
@@ -425,11 +389,12 @@
* want to handle. Thus you cannot kill init even with a SIGKILL even by
* mistake.
*/
-int do_signal(struct pt_regs *regs, sigset_t *oldset)
+static void do_signal(struct pt_regs *regs)
{
struct k_sigaction ka;
siginfo_t info;
int signr;
+ sigset_t *oldset;
/*
* We want the common case to go fast, which
@@ -438,9 +403,11 @@
* if so.
*/
if (!user_mode(regs))
- return 1;
+ return;
- if (!oldset)
+ if (test_thread_flag(TIF_RESTORE_SIGMASK))
+ oldset = ¤t->saved_sigmask;
+ else
oldset = ¤t->blocked;
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
@@ -454,30 +421,46 @@
set_debugreg(current->thread.debugreg7, 7);
/* Whee! Actually deliver the signal. */
- return handle_signal(signr, &info, &ka, oldset, regs);
+ if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
+ /* a signal was successfully delivered; the saved
+ * sigmask will have been stored in the signal frame,
+ * and will be restored by sigreturn, so we can simply
+ * clear the TIF_RESTORE_SIGMASK flag */
+ clear_thread_flag(TIF_RESTORE_SIGMASK);
+ }
+ return;
}
/* Did we come from a system call? */
if ((long)regs->orig_rax >= 0) {
/* Restart the system call - no handlers present */
long res = regs->rax;
- if (res == -ERESTARTNOHAND ||
- res == -ERESTARTSYS ||
- res == -ERESTARTNOINTR) {
+ switch (res) {
+ case -ERESTARTNOHAND:
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
regs->rax = regs->orig_rax;
regs->rip -= 2;
- }
- if (regs->rax == (unsigned long)-ERESTART_RESTARTBLOCK) {
+ break;
+ case -ERESTART_RESTARTBLOCK:
regs->rax = test_thread_flag(TIF_IA32) ?
__NR_ia32_restart_syscall :
__NR_restart_syscall;
regs->rip -= 2;
+ break;
}
}
- return 0;
+
+ /* if there's no signal to deliver, we just put the saved sigmask
+ back. */
+ if (test_thread_flag(TIF_RESTORE_SIGMASK)) {
+ clear_thread_flag(TIF_RESTORE_SIGMASK);
+ sigprocmask(SIG_SETMASK, ¤t->saved_sigmask, NULL);
+ }
}
-void do_notify_resume(struct pt_regs *regs, sigset_t *oldset, __u32 thread_info_flags)
+void
+do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
{
#ifdef DEBUG_SIG
printk("do_notify_resume flags:%x rip:%lx rsp:%lx caller:%lx pending:%lx\n",
@@ -491,8 +474,8 @@
}
/* deal with pending signal delivery */
- if (thread_info_flags & _TIF_SIGPENDING)
- do_signal(regs,oldset);
+ if (thread_info_flags & (_TIF_SIGPENDING|_TIF_RESTORE_SIGMASK))
+ do_signal(regs);
}
void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
diff --git a/arch/x86_64/kernel/smp.c b/arch/x86_64/kernel/smp.c
index 06af6ca..4f67697 100644
--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -522,26 +522,3 @@
}
}
-int safe_smp_processor_id(void)
-{
- unsigned apicid, i;
-
- if (disable_apic)
- return 0;
-
- apicid = hard_smp_processor_id();
- if (apicid < NR_CPUS && x86_cpu_to_apicid[apicid] == apicid)
- return apicid;
-
- for (i = 0; i < NR_CPUS; ++i) {
- if (x86_cpu_to_apicid[i] == apicid)
- return i;
- }
-
- /* No entries in x86_cpu_to_apicid? Either no MPS|ACPI,
- * or called too early. Either way, we must be CPU 0. */
- if (x86_cpu_to_apicid[0] == BAD_APICID)
- return 0;
-
- return 0; /* Should not happen */
-}
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index 3ae9ffd..7b7a687 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -1091,7 +1091,6 @@
/*
* Switch from PIC to APIC mode.
*/
- connect_bsp_APIC();
setup_local_APIC();
if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
@@ -1176,12 +1175,9 @@
void __init smp_cpus_done(unsigned int max_cpus)
{
smp_cleanup_boot();
-
-#ifdef CONFIG_X86_IO_APIC
setup_ioapic_dest();
-#endif
-
check_nmi_watchdog();
+ time_init_gtod();
}
#ifdef CONFIG_HOTPLUG_CPU
@@ -1234,6 +1230,8 @@
if (cpu == 0)
return -EBUSY;
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ stop_apic_nmi_watchdog(NULL);
clear_local_APIC();
/*
@@ -1273,11 +1271,11 @@
printk(KERN_ERR "CPU %u didn't die...\n", cpu);
}
-__init int setup_additional_cpus(char *s)
+static __init int setup_additional_cpus(char *s)
{
- return get_option(&s, &additional_cpus);
+ return s && get_option(&s, &additional_cpus) ? 0 : -EINVAL;
}
-__setup("additional_cpus=", setup_additional_cpus);
+early_param("additional_cpus", setup_additional_cpus);
#else /* ... !CONFIG_HOTPLUG_CPU */
diff --git a/arch/x86_64/kernel/stacktrace.c b/arch/x86_64/kernel/stacktrace.c
index 32cf55e..6026b31 100644
--- a/arch/x86_64/kernel/stacktrace.c
+++ b/arch/x86_64/kernel/stacktrace.c
@@ -7,215 +7,49 @@
*/
#include <linux/sched.h>
#include <linux/stacktrace.h>
+#include <linux/module.h>
+#include <asm/stacktrace.h>
-#include <asm/smp.h>
-
-static inline int
-in_range(unsigned long start, unsigned long addr, unsigned long end)
+static void save_stack_warning(void *data, char *msg)
{
- return addr >= start && addr <= end;
}
-static unsigned long
-get_stack_end(struct task_struct *task, unsigned long stack)
+static void
+save_stack_warning_symbol(void *data, char *msg, unsigned long symbol)
{
- unsigned long stack_start, stack_end, flags;
- int i, cpu;
-
- /*
- * The most common case is that we are in the task stack:
- */
- stack_start = (unsigned long)task->thread_info;
- stack_end = stack_start + THREAD_SIZE;
-
- if (in_range(stack_start, stack, stack_end))
- return stack_end;
-
- /*
- * We are in an interrupt if irqstackptr is set:
- */
- raw_local_irq_save(flags);
- cpu = safe_smp_processor_id();
- stack_end = (unsigned long)cpu_pda(cpu)->irqstackptr;
-
- if (stack_end) {
- stack_start = stack_end & ~(IRQSTACKSIZE-1);
- if (in_range(stack_start, stack, stack_end))
- goto out_restore;
- /*
- * We get here if we are in an IRQ context but we
- * are also in an exception stack.
- */
- }
-
- /*
- * Iterate over all exception stacks, and figure out whether
- * 'stack' is in one of them:
- */
- for (i = 0; i < N_EXCEPTION_STACKS; i++) {
- /*
- * set 'end' to the end of the exception stack.
- */
- stack_end = per_cpu(init_tss, cpu).ist[i];
- stack_start = stack_end - EXCEPTION_STKSZ;
-
- /*
- * Is 'stack' above this exception frame's end?
- * If yes then skip to the next frame.
- */
- if (stack >= stack_end)
- continue;
- /*
- * Is 'stack' above this exception frame's start address?
- * If yes then we found the right frame.
- */
- if (stack >= stack_start)
- goto out_restore;
-
- /*
- * If this is a debug stack, and if it has a larger size than
- * the usual exception stacks, then 'stack' might still
- * be within the lower portion of the debug stack:
- */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
- if (i == DEBUG_STACK - 1 && stack >= stack_end - DEBUG_STKSZ) {
- /*
- * Black magic. A large debug stack is composed of
- * multiple exception stack entries, which we
- * iterate through now. Dont look:
- */
- do {
- stack_end -= EXCEPTION_STKSZ;
- stack_start -= EXCEPTION_STKSZ;
- } while (stack < stack_start);
-
- goto out_restore;
- }
-#endif
- }
- /*
- * Ok, 'stack' is not pointing to any of the system stacks.
- */
- stack_end = 0;
-
-out_restore:
- raw_local_irq_restore(flags);
-
- return stack_end;
}
-
-/*
- * Save stack-backtrace addresses into a stack_trace buffer:
- */
-static inline unsigned long
-save_context_stack(struct stack_trace *trace, unsigned int skip,
- unsigned long stack, unsigned long stack_end)
+static int save_stack_stack(void *data, char *name)
{
- unsigned long addr;
-
-#ifdef CONFIG_FRAME_POINTER
- unsigned long prev_stack = 0;
-
- while (in_range(prev_stack, stack, stack_end)) {
- pr_debug("stack: %p\n", (void *)stack);
- addr = (unsigned long)(((unsigned long *)stack)[1]);
- pr_debug("addr: %p\n", (void *)addr);
- if (!skip)
- trace->entries[trace->nr_entries++] = addr-1;
- else
- skip--;
- if (trace->nr_entries >= trace->max_entries)
- break;
- if (!addr)
- return 0;
- /*
- * Stack frames must go forwards (otherwise a loop could
- * happen if the stackframe is corrupted), so we move
- * prev_stack forwards:
- */
- prev_stack = stack;
- stack = (unsigned long)(((unsigned long *)stack)[0]);
- }
- pr_debug("invalid: %p\n", (void *)stack);
-#else
- while (stack < stack_end) {
- addr = ((unsigned long *)stack)[0];
- stack += sizeof(long);
- if (__kernel_text_address(addr)) {
- if (!skip)
- trace->entries[trace->nr_entries++] = addr-1;
- else
- skip--;
- if (trace->nr_entries >= trace->max_entries)
- break;
- }
- }
-#endif
- return stack;
+ struct stack_trace *trace = (struct stack_trace *)data;
+ return trace->all_contexts ? 0 : -1;
}
-#define MAX_STACKS 10
+static void save_stack_address(void *data, unsigned long addr)
+{
+ struct stack_trace *trace = (struct stack_trace *)data;
+ if (trace->skip > 0) {
+ trace->skip--;
+ return;
+ }
+ if (trace->nr_entries < trace->max_entries - 1)
+ trace->entries[trace->nr_entries++] = addr;
+}
+
+static struct stacktrace_ops save_stack_ops = {
+ .warning = save_stack_warning,
+ .warning_symbol = save_stack_warning_symbol,
+ .stack = save_stack_stack,
+ .address = save_stack_address,
+};
/*
* Save stack-backtrace addresses into a stack_trace buffer.
- * If all_contexts is set, all contexts (hardirq, softirq and process)
- * are saved. If not set then only the current context is saved.
*/
-void save_stack_trace(struct stack_trace *trace,
- struct task_struct *task, int all_contexts,
- unsigned int skip)
+void save_stack_trace(struct stack_trace *trace, struct task_struct *task)
{
- unsigned long stack = (unsigned long)&stack;
- int i, nr_stacks = 0, stacks_done[MAX_STACKS];
-
- WARN_ON(trace->nr_entries || !trace->max_entries);
-
- if (!task)
- task = current;
-
- pr_debug("task: %p, ti: %p\n", task, task->thread_info);
-
- if (!task || task == current) {
- /* Grab rbp right from our regs: */
- asm ("mov %%rbp, %0" : "=r" (stack));
- pr_debug("rbp: %p\n", (void *)stack);
- } else {
- /* rbp is the last reg pushed by switch_to(): */
- stack = task->thread.rsp;
- pr_debug("other task rsp: %p\n", (void *)stack);
- stack = (unsigned long)(((unsigned long *)stack)[0]);
- pr_debug("other task rbp: %p\n", (void *)stack);
- }
-
- while (1) {
- unsigned long stack_end = get_stack_end(task, stack);
-
- pr_debug("stack: %p\n", (void *)stack);
- pr_debug("stack end: %p\n", (void *)stack_end);
-
- /*
- * Invalid stack addres?
- */
- if (!stack_end)
- return;
- /*
- * Were we in this stack already? (recursion)
- */
- for (i = 0; i < nr_stacks; i++)
- if (stacks_done[i] == stack_end)
- return;
- stacks_done[nr_stacks] = stack_end;
-
- stack = save_context_stack(trace, skip, stack, stack_end);
- if (!all_contexts || !stack ||
- trace->nr_entries >= trace->max_entries)
- return;
- trace->entries[trace->nr_entries++] = ULONG_MAX;
- if (trace->nr_entries >= trace->max_entries)
- return;
- if (++nr_stacks >= MAX_STACKS)
- return;
- }
+ dump_trace(task, NULL, NULL, &save_stack_ops, trace);
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
}
+EXPORT_SYMBOL(save_stack_trace);
diff --git a/arch/x86_64/kernel/tce.c b/arch/x86_64/kernel/tce.c
index 5530dda..cbabfdf 100644
--- a/arch/x86_64/kernel/tce.c
+++ b/arch/x86_64/kernel/tce.c
@@ -1,4 +1,6 @@
/*
+ * This file manages the translation entries for the IBM Calgary IOMMU.
+ *
* Derived from arch/powerpc/platforms/pseries/iommu.c
*
* Copyright (C) IBM Corporation, 2006
@@ -104,14 +106,6 @@
/* set the tce table size - measured in entries */
tbl->it_size = table_size_to_number_of_entries(specified_table_size);
- tbl->it_base = (unsigned long)tce_table_kva[dev->bus->number];
- if (!tbl->it_base) {
- printk(KERN_ERR "Calgary: iommu_table_setparms: "
- "no table allocated?!\n");
- ret = -ENOMEM;
- goto done;
- }
-
/*
* number of bytes needed for the bitmap size in number of
* entries; we need one bit per entry
@@ -160,8 +154,6 @@
if (ret)
goto free_tbl;
- tce_free(tbl, 0, tbl->it_size);
-
tbl->bbar = bbar;
/*
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 7700e6c..1c255ee 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -24,6 +24,8 @@
#include <linux/device.h>
#include <linux/sysdev.h>
#include <linux/bcd.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
#include <linux/kallsyms.h>
#include <linux/acpi.h>
#ifdef CONFIG_ACPI
@@ -39,9 +41,7 @@
#include <asm/sections.h>
#include <linux/cpufreq.h>
#include <linux/hpet.h>
-#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/apic.h>
-#endif
#ifdef CONFIG_CPU_FREQ
static void cpufreq_delayed_get(void);
@@ -49,7 +49,7 @@
extern void i8254_timer_resume(void);
extern int using_apic_timer;
-static char *time_init_gtod(void);
+static char *timename = NULL;
DEFINE_SPINLOCK(rtc_lock);
EXPORT_SYMBOL(rtc_lock);
@@ -187,20 +187,15 @@
{
unsigned long pc = instruction_pointer(regs);
- /* Assume the lock function has either no stack frame or only a single
- word. This checks if the address on the stack looks like a kernel
- text address.
- There is a small window for false hits, but in that case the tick
- is just accounted to the spinlock function.
- Better would be to write these functions in assembler again
- and check exactly. */
+ /* Assume the lock function has either no stack frame or a copy
+ of eflags from PUSHF
+ Eflags always has bits 22 and up cleared unlike kernel addresses. */
if (!user_mode(regs) && in_lock_functions(pc)) {
- char *v = *(char **)regs->rsp;
- if ((v >= _stext && v <= _etext) ||
- (v >= _sinittext && v <= _einittext) ||
- (v >= (char *)MODULES_VADDR && v <= (char *)MODULES_END))
- return (unsigned long)v;
- return ((unsigned long *)regs->rsp)[1];
+ unsigned long *sp = (unsigned long *)regs->rsp;
+ if (sp[0] >> 22)
+ return sp[0];
+ if (sp[1] >> 22)
+ return sp[1];
}
return pc;
}
@@ -281,6 +276,7 @@
* Note: This function is required to return accurate
* time even in the absence of multiple timer ticks.
*/
+static inline unsigned long long cycles_2_ns(unsigned long long cyc);
unsigned long long monotonic_clock(void)
{
unsigned long seq;
@@ -305,8 +301,7 @@
base = monotonic_base;
} while (read_seqretry(&xtime_lock, seq));
this_offset = get_cycles_sync();
- /* FIXME: 1000 or 1000000? */
- offset = (this_offset - last_offset)*1000 / cpu_khz;
+ offset = cycles_2_ns(this_offset - last_offset);
}
return base + offset;
}
@@ -410,8 +405,7 @@
offset %= USEC_PER_TICK;
}
- /* FIXME: 1000 or 1000000? */
- monotonic_base += (tsc - vxtime.last_tsc) * 1000000 / cpu_khz;
+ monotonic_base += cycles_2_ns(tsc - vxtime.last_tsc);
vxtime.last_tsc = tsc - vxtime.quot * delay / vxtime.tsc_quot;
@@ -441,12 +435,8 @@
* have to call the local interrupt handler.
*/
-#ifndef CONFIG_X86_LOCAL_APIC
- profile_tick(CPU_PROFILING, regs);
-#else
if (!using_apic_timer)
smp_local_timer_interrupt(regs);
-#endif
/*
* If we have an externally synchronized Linux clock, then update CMOS clock
@@ -470,10 +460,8 @@
if (apic_runs_main_timer > 1)
return IRQ_HANDLED;
main_timer_handler(regs);
-#ifdef CONFIG_X86_LOCAL_APIC
if (using_apic_timer)
smp_send_timer_broadcast_ipi();
-#endif
return IRQ_HANDLED;
}
@@ -893,11 +881,17 @@
timer_interrupt, IRQF_DISABLED, CPU_MASK_NONE, "timer", NULL, NULL
};
+static int __cpuinit
+time_cpu_notifier(struct notifier_block *nb, unsigned long action, void *hcpu)
+{
+ unsigned cpu = (unsigned long) hcpu;
+ if (action == CPU_ONLINE)
+ vsyscall_set_cpu(cpu);
+ return NOTIFY_DONE;
+}
+
void __init time_init(void)
{
- char *timename;
- char *gtod;
-
if (nohpet)
vxtime.hpet_address = 0;
@@ -931,18 +925,17 @@
}
vxtime.mode = VXTIME_TSC;
- gtod = time_init_gtod();
-
- printk(KERN_INFO "time.c: Using %ld.%06ld MHz WALL %s GTOD %s timer.\n",
- vxtime_hz / 1000000, vxtime_hz % 1000000, timename, gtod);
- printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
- cpu_khz / 1000, cpu_khz % 1000);
vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
vxtime.last_tsc = get_cycles_sync();
- setup_irq(0, &irq0);
-
set_cyc2ns_scale(cpu_khz);
+ setup_irq(0, &irq0);
+ hotcpu_notifier(time_cpu_notifier, 0);
+ time_cpu_notifier(NULL, CPU_ONLINE, (void *)(long)smp_processor_id());
+
+#ifndef CONFIG_SMP
+ time_init_gtod();
+#endif
}
/*
@@ -973,12 +966,18 @@
/*
* Decide what mode gettimeofday should use.
*/
-__init static char *time_init_gtod(void)
+void time_init_gtod(void)
{
char *timetype;
if (unsynchronized_tsc())
notsc = 1;
+
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
+ vgetcpu_mode = VGETCPU_RDTSCP;
+ else
+ vgetcpu_mode = VGETCPU_LSL;
+
if (vxtime.hpet_address && notsc) {
timetype = hpet_use_timer ? "HPET" : "PIT/HPET";
if (hpet_use_timer)
@@ -1001,7 +1000,16 @@
timetype = hpet_use_timer ? "HPET/TSC" : "PIT/TSC";
vxtime.mode = VXTIME_TSC;
}
- return timetype;
+
+ printk(KERN_INFO "time.c: Using %ld.%06ld MHz WALL %s GTOD %s timer.\n",
+ vxtime_hz / 1000000, vxtime_hz % 1000000, timename, timetype);
+ printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
+ cpu_khz / 1000, cpu_khz % 1000);
+ vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
+ vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
+ vxtime.last_tsc = get_cycles_sync();
+
+ set_cyc2ns_scale(cpu_khz);
}
__setup("report_lost_ticks", time_setup);
@@ -1031,8 +1039,16 @@
unsigned long flags;
unsigned long sec;
unsigned long ctime = get_cmos_time();
- unsigned long sleep_length = (ctime - sleep_start) * HZ;
+ long sleep_length = (ctime - sleep_start) * HZ;
+ if (sleep_length < 0) {
+ printk(KERN_WARNING "Time skew detected in timer resume!\n");
+ /* The time after the resume must not be earlier than the time
+ * before the suspend or some nasty things will happen
+ */
+ sleep_length = 0;
+ ctime = sleep_start;
+ }
if (vxtime.hpet_address)
hpet_reenable();
else
diff --git a/arch/x86_64/kernel/trampoline.S b/arch/x86_64/kernel/trampoline.S
index 23a03eb9..c79b99a 100644
--- a/arch/x86_64/kernel/trampoline.S
+++ b/arch/x86_64/kernel/trampoline.S
@@ -64,7 +64,7 @@
.word 0, 0 # idt base = 0L
gdt_48:
- .short __KERNEL32_CS + 7 # gdt limit
+ .short GDT_ENTRIES*8 - 1 # gdt limit
.long cpu_gdt_table-__START_KERNEL_map
.globl trampoline_end
diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index b124977..01f2a8d 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -45,6 +45,7 @@
#include <asm/pda.h>
#include <asm/proto.h>
#include <asm/nmi.h>
+#include <asm/stacktrace.h>
asmlinkage void divide_error(void);
asmlinkage void debug(void);
@@ -142,7 +143,7 @@
#endif
static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
- unsigned *usedp, const char **idp)
+ unsigned *usedp, char **idp)
{
static char ids[][8] = {
[DEBUG_STACK - 1] = "#DB",
@@ -161,26 +162,7 @@
* 'stack' is in one of them:
*/
for (k = 0; k < N_EXCEPTION_STACKS; k++) {
- unsigned long end;
-
- /*
- * set 'end' to the end of the exception stack.
- */
- switch (k + 1) {
- /*
- * TODO: this block is not needed i think, because
- * setup64.c:cpu_init() sets up t->ist[DEBUG_STACK]
- * properly too.
- */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
- case DEBUG_STACK:
- end = cpu_pda(cpu)->debugstack + DEBUG_STKSZ;
- break;
-#endif
- default:
- end = per_cpu(orig_ist, cpu).ist[k];
- break;
- }
+ unsigned long end = per_cpu(orig_ist, cpu).ist[k];
/*
* Is 'stack' above this exception frame's end?
* If yes then skip to the next frame.
@@ -234,13 +216,19 @@
return NULL;
}
-static int show_trace_unwind(struct unwind_frame_info *info, void *context)
+struct ops_and_data {
+ struct stacktrace_ops *ops;
+ void *data;
+};
+
+static int dump_trace_unwind(struct unwind_frame_info *info, void *context)
{
+ struct ops_and_data *oad = (struct ops_and_data *)context;
int n = 0;
while (unwind(info) == 0 && UNW_PC(info)) {
n++;
- printk_address(UNW_PC(info));
+ oad->ops->address(oad->data, UNW_PC(info));
if (arch_unw_user_mode(info))
break;
}
@@ -254,45 +242,53 @@
* severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
*/
-void show_trace(struct task_struct *tsk, struct pt_regs *regs, unsigned long * stack)
+void dump_trace(struct task_struct *tsk, struct pt_regs *regs, unsigned long * stack,
+ struct stacktrace_ops *ops, void *data)
{
- const unsigned cpu = safe_smp_processor_id();
+ const unsigned cpu = smp_processor_id();
unsigned long *irqstack_end = (unsigned long *)cpu_pda(cpu)->irqstackptr;
unsigned used = 0;
- printk("\nCall Trace:\n");
-
if (!tsk)
tsk = current;
if (call_trace >= 0) {
int unw_ret = 0;
struct unwind_frame_info info;
+ struct ops_and_data oad = { .ops = ops, .data = data };
if (regs) {
if (unwind_init_frame_info(&info, tsk, regs) == 0)
- unw_ret = show_trace_unwind(&info, NULL);
+ unw_ret = dump_trace_unwind(&info, &oad);
} else if (tsk == current)
- unw_ret = unwind_init_running(&info, show_trace_unwind, NULL);
+ unw_ret = unwind_init_running(&info, dump_trace_unwind, &oad);
else {
if (unwind_init_blocked(&info, tsk) == 0)
- unw_ret = show_trace_unwind(&info, NULL);
+ unw_ret = dump_trace_unwind(&info, &oad);
}
if (unw_ret > 0) {
if (call_trace == 1 && !arch_unw_user_mode(&info)) {
- print_symbol("DWARF2 unwinder stuck at %s\n",
+ ops->warning_symbol(data, "DWARF2 unwinder stuck at %s\n",
UNW_PC(&info));
if ((long)UNW_SP(&info) < 0) {
- printk("Leftover inexact backtrace:\n");
+ ops->warning(data, "Leftover inexact backtrace:\n");
stack = (unsigned long *)UNW_SP(&info);
+ if (!stack)
+ return;
} else
- printk("Full inexact backtrace again:\n");
+ ops->warning(data, "Full inexact backtrace again:\n");
} else if (call_trace >= 1)
return;
else
- printk("Full inexact backtrace again:\n");
+ ops->warning(data, "Full inexact backtrace again:\n");
} else
- printk("Inexact backtrace:\n");
+ ops->warning(data, "Inexact backtrace:\n");
+ }
+ if (!stack) {
+ unsigned long dummy;
+ stack = &dummy;
+ if (tsk && tsk != current)
+ stack = (unsigned long *)tsk->thread.rsp;
}
/*
@@ -303,7 +299,9 @@
#define HANDLE_STACK(cond) \
do while (cond) { \
unsigned long addr = *stack++; \
- if (kernel_text_address(addr)) { \
+ if (oops_in_progress ? \
+ __kernel_text_address(addr) : \
+ kernel_text_address(addr)) { \
/* \
* If the address is either in the text segment of the \
* kernel, or in the region which contains vmalloc'ed \
@@ -312,7 +310,7 @@
* down the cause of the crash will be able to figure \
* out the call path that was taken. \
*/ \
- printk_address(addr); \
+ ops->address(data, addr); \
} \
} while (0)
@@ -321,16 +319,17 @@
* current stack address. If the stacks consist of nested
* exceptions
*/
- for ( ; ; ) {
- const char *id;
+ for (;;) {
+ char *id;
unsigned long *estack_end;
estack_end = in_exception_stack(cpu, (unsigned long)stack,
&used, &id);
if (estack_end) {
- printk(" <%s>", id);
+ if (ops->stack(data, id) < 0)
+ break;
HANDLE_STACK (stack < estack_end);
- printk(" <EOE>");
+ ops->stack(data, "<EOE>");
/*
* We link to the next stack via the
* second-to-last pointer (index -2 to end) in the
@@ -345,7 +344,8 @@
(IRQSTACKSIZE - 64) / sizeof(*irqstack);
if (stack >= irqstack && stack < irqstack_end) {
- printk(" <IRQ>");
+ if (ops->stack(data, "IRQ") < 0)
+ break;
HANDLE_STACK (stack < irqstack_end);
/*
* We link to the next stack (which would be
@@ -354,7 +354,7 @@
*/
stack = (unsigned long *) (irqstack_end[-1]);
irqstack_end = NULL;
- printk(" <EOI>");
+ ops->stack(data, "EOI");
continue;
}
}
@@ -362,19 +362,57 @@
}
/*
- * This prints the process stack:
+ * This handles the process stack:
*/
HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);
#undef HANDLE_STACK
+}
+EXPORT_SYMBOL(dump_trace);
+static void
+print_trace_warning_symbol(void *data, char *msg, unsigned long symbol)
+{
+ print_symbol(msg, symbol);
printk("\n");
}
-static void _show_stack(struct task_struct *tsk, struct pt_regs *regs, unsigned long * rsp)
+static void print_trace_warning(void *data, char *msg)
+{
+ printk("%s\n", msg);
+}
+
+static int print_trace_stack(void *data, char *name)
+{
+ printk(" <%s> ", name);
+ return 0;
+}
+
+static void print_trace_address(void *data, unsigned long addr)
+{
+ printk_address(addr);
+}
+
+static struct stacktrace_ops print_trace_ops = {
+ .warning = print_trace_warning,
+ .warning_symbol = print_trace_warning_symbol,
+ .stack = print_trace_stack,
+ .address = print_trace_address,
+};
+
+void
+show_trace(struct task_struct *tsk, struct pt_regs *regs, unsigned long *stack)
+{
+ printk("\nCall Trace:\n");
+ dump_trace(tsk, regs, stack, &print_trace_ops, NULL);
+ printk("\n");
+}
+
+static void
+_show_stack(struct task_struct *tsk, struct pt_regs *regs, unsigned long *rsp)
{
unsigned long *stack;
int i;
- const int cpu = safe_smp_processor_id();
+ const int cpu = smp_processor_id();
unsigned long *irqstack_end = (unsigned long *) (cpu_pda(cpu)->irqstackptr);
unsigned long *irqstack = (unsigned long *) (cpu_pda(cpu)->irqstackptr - IRQSTACKSIZE);
@@ -428,7 +466,7 @@
int i;
int in_kernel = !user_mode(regs);
unsigned long rsp;
- const int cpu = safe_smp_processor_id();
+ const int cpu = smp_processor_id();
struct task_struct *cur = cpu_pda(cpu)->pcurrent;
rsp = regs->rsp;
@@ -503,9 +541,11 @@
unsigned __kprobes long oops_begin(void)
{
- int cpu = safe_smp_processor_id();
+ int cpu = smp_processor_id();
unsigned long flags;
+ oops_enter();
+
/* racy, but better than risking deadlock. */
local_irq_save(flags);
if (!spin_trylock(&die_lock)) {
@@ -534,6 +574,7 @@
spin_unlock_irqrestore(&die_lock, flags);
if (panic_on_oops)
panic("Fatal exception");
+ oops_exit();
}
void __kprobes __die(const char * str, struct pt_regs * regs, long err)
@@ -570,7 +611,7 @@
do_exit(SIGSEGV);
}
-void __kprobes die_nmi(char *str, struct pt_regs *regs)
+void __kprobes die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags = oops_begin();
@@ -578,13 +619,12 @@
* We are in trouble anyway, lets at least try
* to get a message out.
*/
- printk(str, safe_smp_processor_id());
+ printk(str, smp_processor_id());
show_registers(regs);
if (kexec_should_crash(current))
crash_kexec(regs);
- if (panic_on_timeout || panic_on_oops)
- panic("nmi watchdog");
- printk("console shuts up ...\n");
+ if (do_panic || panic_on_oops)
+ panic("Non maskable interrupt");
oops_end(flags);
nmi_exit();
local_irq_enable();
@@ -730,8 +770,15 @@
static __kprobes void
mem_parity_error(unsigned char reason, struct pt_regs * regs)
{
- printk("Uhhuh. NMI received. Dazed and confused, but trying to continue\n");
- printk("You probably have a hardware problem with your RAM chips\n");
+ printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n",
+ reason);
+ printk(KERN_EMERG "You probably have a hardware problem with your "
+ "RAM chips\n");
+
+ if (panic_on_unrecovered_nmi)
+ panic("NMI: Not continuing");
+
+ printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
/* Clear and disable the memory parity error line. */
reason = (reason & 0xf) | 4;
@@ -754,9 +801,15 @@
static __kprobes void
unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
-{ printk("Uhhuh. NMI received for unknown reason %02x.\n", reason);
- printk("Dazed and confused, but trying to continue\n");
- printk("Do you have a strange power saving mode enabled?\n");
+{
+ printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n",
+ reason);
+ printk(KERN_EMERG "Do you have a strange power saving mode enabled?\n");
+
+ if (panic_on_unrecovered_nmi)
+ panic("NMI: Not continuing");
+
+ printk(KERN_EMERG "Dazed and confused, but trying to continue\n");
}
/* Runs on IST stack. This code must keep interrupts off all the time.
@@ -776,17 +829,15 @@
if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
== NOTIFY_STOP)
return;
-#ifdef CONFIG_X86_LOCAL_APIC
/*
* Ok, so this is none of the documented NMI sources,
* so it must be the NMI watchdog.
*/
- if (nmi_watchdog > 0) {
- nmi_watchdog_tick(regs,reason);
+ if (nmi_watchdog_tick(regs,reason))
return;
- }
-#endif
- unknown_nmi_error(reason, regs);
+ if (!do_nmi_callback(regs,cpu))
+ unknown_nmi_error(reason, regs);
+
return;
}
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
@@ -1071,6 +1122,7 @@
init_fpu(me);
restore_fpu_checking(&me->thread.i387.fxsave);
task_thread_info(me)->status |= TS_USEDFPU;
+ me->fpu_counter++;
}
void __init trap_init(void)
@@ -1109,24 +1161,30 @@
}
-/* Actual parsing is done early in setup.c. */
-static int __init oops_dummy(char *s)
+static int __init oops_setup(char *s)
{
- panic_on_oops = 1;
- return 1;
+ if (!s)
+ return -EINVAL;
+ if (!strcmp(s, "panic"))
+ panic_on_oops = 1;
+ return 0;
}
-__setup("oops=", oops_dummy);
+early_param("oops", oops_setup);
static int __init kstack_setup(char *s)
{
+ if (!s)
+ return -EINVAL;
kstack_depth_to_print = simple_strtoul(s,NULL,0);
- return 1;
+ return 0;
}
-__setup("kstack=", kstack_setup);
+early_param("kstack", kstack_setup);
#ifdef CONFIG_STACK_UNWIND
static int __init call_trace_setup(char *s)
{
+ if (!s)
+ return -EINVAL;
if (strcmp(s, "old") == 0)
call_trace = -1;
else if (strcmp(s, "both") == 0)
@@ -1135,7 +1193,7 @@
call_trace = 1;
else if (strcmp(s, "new") == 0)
call_trace = 2;
- return 1;
+ return 0;
}
-__setup("call_trace=", call_trace_setup);
+early_param("call_trace", call_trace_setup);
#endif
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index 7c4de31..d0564f1 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -13,6 +13,12 @@
OUTPUT_ARCH(i386:x86-64)
ENTRY(phys_startup_64)
jiffies_64 = jiffies;
+PHDRS {
+ text PT_LOAD FLAGS(5); /* R_E */
+ data PT_LOAD FLAGS(7); /* RWE */
+ user PT_LOAD FLAGS(7); /* RWE */
+ note PT_NOTE FLAGS(4); /* R__ */
+}
SECTIONS
{
. = __START_KERNEL;
@@ -31,7 +37,7 @@
KPROBES_TEXT
*(.fixup)
*(.gnu.warning)
- } = 0x9090
+ } :text = 0x9090
/* out-of-line lock text */
.text.lock : AT(ADDR(.text.lock) - LOAD_OFFSET) { *(.text.lock) }
@@ -57,7 +63,7 @@
.data : AT(ADDR(.data) - LOAD_OFFSET) {
*(.data)
CONSTRUCTORS
- }
+ } :data
_edata = .; /* End of data section */
@@ -89,7 +95,7 @@
#define VVIRT(x) (ADDR(x) - VVIRT_OFFSET)
. = VSYSCALL_ADDR;
- .vsyscall_0 : AT(VSYSCALL_PHYS_ADDR) { *(.vsyscall_0) }
+ .vsyscall_0 : AT(VSYSCALL_PHYS_ADDR) { *(.vsyscall_0) } :user
__vsyscall_0 = VSYSCALL_VIRT_ADDR;
. = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
@@ -99,6 +105,9 @@
.vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
vxtime = VVIRT(.vxtime);
+ .vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) }
+ vgetcpu_mode = VVIRT(.vgetcpu_mode);
+
.wall_jiffies : AT(VLOAD(.wall_jiffies)) { *(.wall_jiffies) }
wall_jiffies = VVIRT(.wall_jiffies);
@@ -132,7 +141,7 @@
. = ALIGN(8192); /* init_task */
.data.init_task : AT(ADDR(.data.init_task) - LOAD_OFFSET) {
*(.data.init_task)
- }
+ } :data
. = ALIGN(4096);
.data.page_aligned : AT(ADDR(.data.page_aligned) - LOAD_OFFSET) {
@@ -207,14 +216,12 @@
__initramfs_start = .;
.init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) { *(.init.ramfs) }
__initramfs_end = .;
- /* temporary here to work around NR_CPUS. If you see this comment in 2.6.17+
- complain */
- . = ALIGN(4096);
- __init_end = .;
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
__per_cpu_start = .;
.data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
__per_cpu_end = .;
+ . = ALIGN(4096);
+ __init_end = .;
. = ALIGN(4096);
__nosave_begin = .;
diff --git a/arch/x86_64/kernel/vsmp.c b/arch/x86_64/kernel/vsmp.c
index 92f70c7..044e852 100644
--- a/arch/x86_64/kernel/vsmp.c
+++ b/arch/x86_64/kernel/vsmp.c
@@ -20,6 +20,9 @@
void *address;
unsigned int cap, ctl;
+ if (!early_pci_allowed())
+ return 0;
+
/* Check if we are running on a ScaleMP vSMP box */
if ((read_pci_config_16(0, 0x1f, 0, PCI_VENDOR_ID) != PCI_VENDOR_ID_SCALEMP) ||
(read_pci_config_16(0, 0x1f, 0, PCI_DEVICE_ID) != PCI_DEVICE_ID_SCALEMP_VSMP_CTL))
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index f603037..ac48c38 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -26,6 +26,7 @@
#include <linux/seqlock.h>
#include <linux/jiffies.h>
#include <linux/sysctl.h>
+#include <linux/getcpu.h>
#include <asm/vsyscall.h>
#include <asm/pgtable.h>
@@ -33,11 +34,15 @@
#include <asm/fixmap.h>
#include <asm/errno.h>
#include <asm/io.h>
+#include <asm/segment.h>
+#include <asm/desc.h>
+#include <asm/topology.h>
#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
int __sysctl_vsyscall __section_sysctl_vsyscall = 1;
seqlock_t __xtime_lock __section_xtime_lock = SEQLOCK_UNLOCKED;
+int __vgetcpu_mode __section_vgetcpu_mode;
#include <asm/unistd.h>
@@ -72,7 +77,8 @@
__vxtime.tsc_quot) >> 32;
/* See comment in x86_64 do_gettimeofday. */
} else {
- usec += ((readl((void *)fix_to_virt(VSYSCALL_HPET) + 0xf0) -
+ usec += ((readl((void __iomem *)
+ fix_to_virt(VSYSCALL_HPET) + 0xf0) -
__vxtime.last) * __vxtime.quot) >> 32;
}
} while (read_seqretry(&__xtime_lock, sequence));
@@ -127,9 +133,46 @@
return __xtime.tv_sec;
}
-long __vsyscall(2) venosys_0(void)
+/* Fast way to get current CPU and node.
+ This helps to do per node and per CPU caches in user space.
+ The result is not guaranteed without CPU affinity, but usually
+ works out because the scheduler tries to keep a thread on the same
+ CPU.
+
+ tcache must point to a two element sized long array.
+ All arguments can be NULL. */
+long __vsyscall(2)
+vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
{
- return -ENOSYS;
+ unsigned int dummy, p;
+ unsigned long j = 0;
+
+ /* Fast cache - only recompute value once per jiffies and avoid
+ relatively costly rdtscp/cpuid otherwise.
+ This works because the scheduler usually keeps the process
+ on the same CPU and this syscall doesn't guarantee its
+ results anyways.
+ We do this here because otherwise user space would do it on
+ its own in a likely inferior way (no access to jiffies).
+ If you don't like it pass NULL. */
+ if (tcache && tcache->t0 == (j = __jiffies)) {
+ p = tcache->t1;
+ } else if (__vgetcpu_mode == VGETCPU_RDTSCP) {
+ /* Load per CPU data from RDTSCP */
+ rdtscp(dummy, dummy, p);
+ } else {
+ /* Load per CPU data from GDT */
+ asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+ }
+ if (tcache) {
+ tcache->t0 = j;
+ tcache->t1 = p;
+ }
+ if (cpu)
+ *cpu = p & 0xfff;
+ if (node)
+ *node = p >> 12;
+ return 0;
}
long __vsyscall(3) venosys_1(void)
@@ -149,7 +192,8 @@
void __user *buffer, size_t *lenp, loff_t *ppos)
{
extern u16 vsysc1, vsysc2;
- u16 *map1, *map2;
+ u16 __iomem *map1;
+ u16 __iomem *map2;
int ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
if (!write)
return ret;
@@ -164,11 +208,11 @@
goto out;
}
if (!sysctl_vsyscall) {
- *map1 = SYSCALL;
- *map2 = SYSCALL;
+ writew(SYSCALL, map1);
+ writew(SYSCALL, map2);
} else {
- *map1 = NOP2;
- *map2 = NOP2;
+ writew(NOP2, map1);
+ writew(NOP2, map2);
}
iounmap(map2);
out:
@@ -200,6 +244,43 @@
#endif
+static void __cpuinit write_rdtscp_cb(void *info)
+{
+ write_rdtscp_aux((unsigned long)info);
+}
+
+void __cpuinit vsyscall_set_cpu(int cpu)
+{
+ unsigned long *d;
+ unsigned long node = 0;
+#ifdef CONFIG_NUMA
+ node = cpu_to_node[cpu];
+#endif
+ if (cpu_has(&cpu_data[cpu], X86_FEATURE_RDTSCP)) {
+ void *info = (void *)((node << 12) | cpu);
+ /* Can happen on preemptive kernel */
+ if (get_cpu() == cpu)
+ write_rdtscp_cb(info);
+#ifdef CONFIG_SMP
+ else {
+ /* the notifier is unfortunately not executed on the
+ target CPU */
+ smp_call_function_single(cpu,write_rdtscp_cb,info,0,1);
+ }
+#endif
+ put_cpu();
+ }
+
+ /* Store cpu number in limit so that it can be loaded quickly
+ in user space in vgetcpu.
+ 12 bits for the CPU and 8 bits for the node. */
+ d = (unsigned long *)(cpu_gdt(cpu) + GDT_ENTRY_PER_CPU);
+ *d = 0x0f40000000000ULL;
+ *d |= cpu;
+ *d |= (node & 0xf) << 12;
+ *d |= (node >> 4) << 48;
+}
+
static void __init map_vsyscall(void)
{
extern char __vsyscall_0;
@@ -214,6 +295,7 @@
VSYSCALL_ADDR(__NR_vgettimeofday)));
BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
+ BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
map_vsyscall();
#ifdef CONFIG_SYSCTL
register_sysctl_table(kernel_root_table2, 0);
diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 370952c..c3454af 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -29,6 +29,7 @@
EXPORT_SYMBOL(copy_user_generic);
EXPORT_SYMBOL(copy_from_user);
EXPORT_SYMBOL(copy_to_user);
+EXPORT_SYMBOL(__copy_from_user_inatomic);
EXPORT_SYMBOL(copy_page);
EXPORT_SYMBOL(clear_page);
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index ccef6ae..b78d417 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -9,4 +9,4 @@
lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o \
thunk.o clear_page.o copy_page.o bitstr.o bitops.o
-lib-y += memcpy.o memmove.o memset.o copy_user.o
+lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o
diff --git a/arch/x86_64/lib/clear_page.S b/arch/x86_64/lib/clear_page.S
index 1f81b79..9a10a78 100644
--- a/arch/x86_64/lib/clear_page.S
+++ b/arch/x86_64/lib/clear_page.S
@@ -1,10 +1,22 @@
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
/*
* Zero a page.
* rdi page
*/
- .globl clear_page
- .p2align 4
-clear_page:
+ ALIGN
+clear_page_c:
+ CFI_STARTPROC
+ movl $4096/8,%ecx
+ xorl %eax,%eax
+ rep stosq
+ ret
+ CFI_ENDPROC
+ENDPROC(clear_page)
+
+ENTRY(clear_page)
+ CFI_STARTPROC
xorl %eax,%eax
movl $4096/64,%ecx
.p2align 4
@@ -23,28 +35,25 @@
jnz .Lloop
nop
ret
-clear_page_end:
+ CFI_ENDPROC
+.Lclear_page_end:
+ENDPROC(clear_page)
/* Some CPUs run faster using the string instructions.
It is also a lot simpler. Use this when possible */
#include <asm/cpufeature.h>
+ .section .altinstr_replacement,"ax"
+1: .byte 0xeb /* jmp <disp8> */
+ .byte (clear_page_c - clear_page) - (2f - 1b) /* offset */
+2:
+ .previous
.section .altinstructions,"a"
.align 8
- .quad clear_page
- .quad clear_page_c
- .byte X86_FEATURE_REP_GOOD
- .byte clear_page_end-clear_page
- .byte clear_page_c_end-clear_page_c
- .previous
-
- .section .altinstr_replacement,"ax"
-clear_page_c:
- movl $4096/8,%ecx
- xorl %eax,%eax
- rep
- stosq
- ret
-clear_page_c_end:
+ .quad clear_page
+ .quad 1b
+ .byte X86_FEATURE_REP_GOOD
+ .byte .Lclear_page_end - clear_page
+ .byte 2b - 1b
.previous
diff --git a/arch/x86_64/lib/copy_page.S b/arch/x86_64/lib/copy_page.S
index 8fa19d9..0ebb03b 100644
--- a/arch/x86_64/lib/copy_page.S
+++ b/arch/x86_64/lib/copy_page.S
@@ -1,17 +1,33 @@
/* Written 2003 by Andi Kleen, based on a kernel by Evandro Menezes */
+#include <linux/config.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
+ ALIGN
+copy_page_c:
+ CFI_STARTPROC
+ movl $4096/8,%ecx
+ rep movsq
+ ret
+ CFI_ENDPROC
+ENDPROC(copy_page_c)
+
/* Don't use streaming store because it's better when the target
ends up in cache. */
/* Could vary the prefetch distance based on SMP/UP */
- .globl copy_page
- .p2align 4
-copy_page:
+ENTRY(copy_page)
+ CFI_STARTPROC
subq $3*8,%rsp
+ CFI_ADJUST_CFA_OFFSET 3*8
movq %rbx,(%rsp)
+ CFI_REL_OFFSET rbx, 0
movq %r12,1*8(%rsp)
+ CFI_REL_OFFSET r12, 1*8
movq %r13,2*8(%rsp)
+ CFI_REL_OFFSET r13, 2*8
movl $(4096/64)-5,%ecx
.p2align 4
@@ -72,30 +88,33 @@
jnz .Loop2
movq (%rsp),%rbx
+ CFI_RESTORE rbx
movq 1*8(%rsp),%r12
+ CFI_RESTORE r12
movq 2*8(%rsp),%r13
+ CFI_RESTORE r13
addq $3*8,%rsp
+ CFI_ADJUST_CFA_OFFSET -3*8
ret
+.Lcopy_page_end:
+ CFI_ENDPROC
+ENDPROC(copy_page)
/* Some CPUs run faster using the string copy instructions.
It is also a lot simpler. Use this when possible */
#include <asm/cpufeature.h>
+ .section .altinstr_replacement,"ax"
+1: .byte 0xeb /* jmp <disp8> */
+ .byte (copy_page_c - copy_page) - (2f - 1b) /* offset */
+2:
+ .previous
.section .altinstructions,"a"
.align 8
- .quad copy_page
- .quad copy_page_c
- .byte X86_FEATURE_REP_GOOD
- .byte copy_page_c_end-copy_page_c
- .byte copy_page_c_end-copy_page_c
- .previous
-
- .section .altinstr_replacement,"ax"
-copy_page_c:
- movl $4096/8,%ecx
- rep
- movsq
- ret
-copy_page_c_end:
+ .quad copy_page
+ .quad 1b
+ .byte X86_FEATURE_REP_GOOD
+ .byte .Lcopy_page_end - copy_page
+ .byte 2b - 1b
.previous
diff --git a/arch/x86_64/lib/copy_user.S b/arch/x86_64/lib/copy_user.S
index f64569b..70bebd3 100644
--- a/arch/x86_64/lib/copy_user.S
+++ b/arch/x86_64/lib/copy_user.S
@@ -4,56 +4,78 @@
* Functions to copy from and to user space.
*/
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
#define FIX_ALIGNMENT 1
- #include <asm/current.h>
- #include <asm/asm-offsets.h>
- #include <asm/thread_info.h>
- #include <asm/cpufeature.h>
+#include <asm/current.h>
+#include <asm/asm-offsets.h>
+#include <asm/thread_info.h>
+#include <asm/cpufeature.h>
+
+ .macro ALTERNATIVE_JUMP feature,orig,alt
+0:
+ .byte 0xe9 /* 32bit jump */
+ .long \orig-1f /* by default jump to orig */
+1:
+ .section .altinstr_replacement,"ax"
+2: .byte 0xe9 /* near jump with 32bit immediate */
+ .long \alt-1b /* offset */ /* or alternatively to alt */
+ .previous
+ .section .altinstructions,"a"
+ .align 8
+ .quad 0b
+ .quad 2b
+ .byte \feature /* when feature is set */
+ .byte 5
+ .byte 5
+ .previous
+ .endm
/* Standard copy_to_user with segment limit checking */
- .globl copy_to_user
- .p2align 4
-copy_to_user:
+ENTRY(copy_to_user)
+ CFI_STARTPROC
GET_THREAD_INFO(%rax)
movq %rdi,%rcx
addq %rdx,%rcx
jc bad_to_user
cmpq threadinfo_addr_limit(%rax),%rcx
jae bad_to_user
-2:
- .byte 0xe9 /* 32bit jump */
- .long .Lcug-1f
-1:
+ xorl %eax,%eax /* clear zero flag */
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ CFI_ENDPROC
- .section .altinstr_replacement,"ax"
-3: .byte 0xe9 /* replacement jmp with 8 bit immediate */
- .long copy_user_generic_c-1b /* offset */
- .previous
- .section .altinstructions,"a"
- .align 8
- .quad 2b
- .quad 3b
- .byte X86_FEATURE_REP_GOOD
- .byte 5
- .byte 5
- .previous
+ENTRY(copy_user_generic)
+ CFI_STARTPROC
+ movl $1,%ecx /* set zero flag */
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ CFI_ENDPROC
+
+ENTRY(__copy_from_user_inatomic)
+ CFI_STARTPROC
+ xorl %ecx,%ecx /* clear zero flag */
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ CFI_ENDPROC
/* Standard copy_from_user with segment limit checking */
- .globl copy_from_user
- .p2align 4
-copy_from_user:
+ENTRY(copy_from_user)
+ CFI_STARTPROC
GET_THREAD_INFO(%rax)
movq %rsi,%rcx
addq %rdx,%rcx
jc bad_from_user
cmpq threadinfo_addr_limit(%rax),%rcx
jae bad_from_user
- /* FALL THROUGH to copy_user_generic */
+ movl $1,%ecx /* set zero flag */
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ CFI_ENDPROC
+ENDPROC(copy_from_user)
.section .fixup,"ax"
/* must zero dest */
bad_from_user:
+ CFI_STARTPROC
movl %edx,%ecx
xorl %eax,%eax
rep
@@ -61,40 +83,32 @@
bad_to_user:
movl %edx,%eax
ret
+ CFI_ENDPROC
+END(bad_from_user)
.previous
/*
- * copy_user_generic - memory copy with exception handling.
+ * copy_user_generic_unrolled - memory copy with exception handling.
+ * This version is for CPUs like P4 that don't have efficient micro code for rep movsq
*
* Input:
* rdi destination
* rsi source
* rdx count
+ * ecx zero flag -- if true zero destination on error
*
* Output:
* eax uncopied bytes or 0 if successful.
*/
- .globl copy_user_generic
- .p2align 4
-copy_user_generic:
- .byte 0x66,0x66,0x90 /* 5 byte nop for replacement jump */
- .byte 0x66,0x90
-1:
- .section .altinstr_replacement,"ax"
-2: .byte 0xe9 /* near jump with 32bit immediate */
- .long copy_user_generic_c-1b /* offset */
- .previous
- .section .altinstructions,"a"
- .align 8
- .quad copy_user_generic
- .quad 2b
- .byte X86_FEATURE_REP_GOOD
- .byte 5
- .byte 5
- .previous
-.Lcug:
+ENTRY(copy_user_generic_unrolled)
+ CFI_STARTPROC
pushq %rbx
+ CFI_ADJUST_CFA_OFFSET 8
+ CFI_REL_OFFSET rbx, 0
+ pushq %rcx
+ CFI_ADJUST_CFA_OFFSET 8
+ CFI_REL_OFFSET rcx, 0
xorl %eax,%eax /*zero for the exception handler */
#ifdef FIX_ALIGNMENT
@@ -168,9 +182,16 @@
decl %ecx
jnz .Lloop_1
+ CFI_REMEMBER_STATE
.Lende:
+ popq %rcx
+ CFI_ADJUST_CFA_OFFSET -8
+ CFI_RESTORE rcx
popq %rbx
+ CFI_ADJUST_CFA_OFFSET -8
+ CFI_RESTORE rbx
ret
+ CFI_RESTORE_STATE
#ifdef FIX_ALIGNMENT
/* align destination */
@@ -252,6 +273,8 @@
addl %ecx,%edx
/* edx: bytes to zero, rdi: dest, eax:zero */
.Lzero_rest:
+ cmpl $0,(%rsp)
+ jz .Le_zero
movq %rdx,%rcx
.Le_byte:
xorl %eax,%eax
@@ -261,6 +284,9 @@
.Le_zero:
movq %rdx,%rax
jmp .Lende
+ CFI_ENDPROC
+ENDPROC(copy_user_generic)
+
/* Some CPUs run faster using the string copy instructions.
This is also a lot simpler. Use them when possible.
@@ -270,6 +296,7 @@
/* rdi destination
* rsi source
* rdx count
+ * ecx zero flag
*
* Output:
* eax uncopied bytes or 0 if successfull.
@@ -280,22 +307,48 @@
* And more would be dangerous because both Intel and AMD have
* errata with rep movsq > 4GB. If someone feels the need to fix
* this please consider this.
- */
-copy_user_generic_c:
+ */
+ENTRY(copy_user_generic_string)
+ CFI_STARTPROC
+ movl %ecx,%r8d /* save zero flag */
movl %edx,%ecx
shrl $3,%ecx
andl $7,%edx
+ jz 10f
1: rep
movsq
movl %edx,%ecx
2: rep
movsb
-4: movl %ecx,%eax
+9: movl %ecx,%eax
ret
-3: lea (%rdx,%rcx,8),%rax
+
+ /* multiple of 8 byte */
+10: rep
+ movsq
+ xor %eax,%eax
ret
+ /* exception handling */
+3: lea (%rdx,%rcx,8),%rax /* exception on quad loop */
+ jmp 6f
+5: movl %ecx,%eax /* exception on byte loop */
+ /* eax: left over bytes */
+6: testl %r8d,%r8d /* zero flag set? */
+ jz 7f
+ movl %eax,%ecx /* initialize x86 loop counter */
+ push %rax
+ xorl %eax,%eax
+8: rep
+ stosb /* zero the rest */
+11: pop %rax
+7: ret
+ CFI_ENDPROC
+END(copy_user_generic_c)
+
.section __ex_table,"a"
.quad 1b,3b
- .quad 2b,4b
+ .quad 2b,5b
+ .quad 8b,11b
+ .quad 10b,3b
.previous
diff --git a/arch/x86_64/lib/csum-copy.S b/arch/x86_64/lib/csum-copy.S
index 72fd55e..f0dba36 100644
--- a/arch/x86_64/lib/csum-copy.S
+++ b/arch/x86_64/lib/csum-copy.S
@@ -5,8 +5,9 @@
* License. See the file COPYING in the main directory of this archive
* for more details. No warranty for anything given at all.
*/
- #include <linux/linkage.h>
- #include <asm/errno.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/errno.h>
/*
* Checksum copy with exception handling.
@@ -53,19 +54,24 @@
.endm
- .globl csum_partial_copy_generic
- .p2align 4
-csum_partial_copy_generic:
+ENTRY(csum_partial_copy_generic)
+ CFI_STARTPROC
cmpl $3*64,%edx
jle .Lignore
.Lignore:
subq $7*8,%rsp
+ CFI_ADJUST_CFA_OFFSET 7*8
movq %rbx,2*8(%rsp)
+ CFI_REL_OFFSET rbx, 2*8
movq %r12,3*8(%rsp)
+ CFI_REL_OFFSET r12, 3*8
movq %r14,4*8(%rsp)
+ CFI_REL_OFFSET r14, 4*8
movq %r13,5*8(%rsp)
+ CFI_REL_OFFSET r13, 5*8
movq %rbp,6*8(%rsp)
+ CFI_REL_OFFSET rbp, 6*8
movq %r8,(%rsp)
movq %r9,1*8(%rsp)
@@ -208,14 +214,22 @@
addl %ebx,%eax
adcl %r9d,%eax /* carry */
+ CFI_REMEMBER_STATE
.Lende:
movq 2*8(%rsp),%rbx
+ CFI_RESTORE rbx
movq 3*8(%rsp),%r12
+ CFI_RESTORE r12
movq 4*8(%rsp),%r14
+ CFI_RESTORE r14
movq 5*8(%rsp),%r13
+ CFI_RESTORE r13
movq 6*8(%rsp),%rbp
+ CFI_RESTORE rbp
addq $7*8,%rsp
+ CFI_ADJUST_CFA_OFFSET -7*8
ret
+ CFI_RESTORE_STATE
/* Exception handlers. Very simple, zeroing is done in the wrappers */
.Lbad_source:
@@ -231,3 +245,5 @@
jz .Lende
movl $-EFAULT,(%rax)
jmp .Lende
+ CFI_ENDPROC
+ENDPROC(csum_partial_copy_generic)
diff --git a/arch/x86_64/lib/getuser.S b/arch/x86_64/lib/getuser.S
index 3844d5e..5448876 100644
--- a/arch/x86_64/lib/getuser.S
+++ b/arch/x86_64/lib/getuser.S
@@ -27,25 +27,26 @@
*/
#include <linux/linkage.h>
+#include <asm/dwarf2.h>
#include <asm/page.h>
#include <asm/errno.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
.text
- .p2align 4
-.globl __get_user_1
-__get_user_1:
+ENTRY(__get_user_1)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
cmpq threadinfo_addr_limit(%r8),%rcx
jae bad_get_user
1: movzb (%rcx),%edx
xorl %eax,%eax
ret
+ CFI_ENDPROC
+ENDPROC(__get_user_1)
- .p2align 4
-.globl __get_user_2
-__get_user_2:
+ENTRY(__get_user_2)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $1,%rcx
jc 20f
@@ -57,10 +58,11 @@
ret
20: decq %rcx
jmp bad_get_user
+ CFI_ENDPROC
+ENDPROC(__get_user_2)
- .p2align 4
-.globl __get_user_4
-__get_user_4:
+ENTRY(__get_user_4)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $3,%rcx
jc 30f
@@ -72,10 +74,11 @@
ret
30: subq $3,%rcx
jmp bad_get_user
+ CFI_ENDPROC
+ENDPROC(__get_user_4)
- .p2align 4
-.globl __get_user_8
-__get_user_8:
+ENTRY(__get_user_8)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $7,%rcx
jc 40f
@@ -87,11 +90,16 @@
ret
40: subq $7,%rcx
jmp bad_get_user
+ CFI_ENDPROC
+ENDPROC(__get_user_8)
bad_get_user:
+ CFI_STARTPROC
xorl %edx,%edx
movq $(-EFAULT),%rax
ret
+ CFI_ENDPROC
+END(bad_get_user)
.section __ex_table,"a"
.quad 1b,bad_get_user
diff --git a/arch/x86_64/lib/iomap_copy.S b/arch/x86_64/lib/iomap_copy.S
index 8bbade5..05a95e7 100644
--- a/arch/x86_64/lib/iomap_copy.S
+++ b/arch/x86_64/lib/iomap_copy.S
@@ -15,12 +15,16 @@
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
/*
* override generic version in lib/iomap_copy.c
*/
- .globl __iowrite32_copy
- .p2align 4
-__iowrite32_copy:
+ENTRY(__iowrite32_copy)
+ CFI_STARTPROC
movl %edx,%ecx
rep movsd
ret
+ CFI_ENDPROC
+ENDPROC(__iowrite32_copy)
diff --git a/arch/x86_64/lib/memcpy.S b/arch/x86_64/lib/memcpy.S
index 5554948..967b22f 100644
--- a/arch/x86_64/lib/memcpy.S
+++ b/arch/x86_64/lib/memcpy.S
@@ -1,6 +1,10 @@
/* Copyright 2002 Andi Kleen */
- #include <asm/cpufeature.h>
+#include <linux/config.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
+
/*
* memcpy - Copy a memory block.
*
@@ -13,12 +17,26 @@
* rax original destination
*/
- .globl __memcpy
- .globl memcpy
- .p2align 4
-__memcpy:
-memcpy:
+ ALIGN
+memcpy_c:
+ CFI_STARTPROC
+ movq %rdi,%rax
+ movl %edx,%ecx
+ shrl $3,%ecx
+ andl $7,%edx
+ rep movsq
+ movl %edx,%ecx
+ rep movsb
+ ret
+ CFI_ENDPROC
+ENDPROC(memcpy_c)
+
+ENTRY(__memcpy)
+ENTRY(memcpy)
+ CFI_STARTPROC
pushq %rbx
+ CFI_ADJUST_CFA_OFFSET 8
+ CFI_REL_OFFSET rbx, 0
movq %rdi,%rax
movl %edx,%ecx
@@ -86,36 +104,27 @@
.Lende:
popq %rbx
+ CFI_ADJUST_CFA_OFFSET -8
+ CFI_RESTORE rbx
ret
.Lfinal:
+ CFI_ENDPROC
+ENDPROC(memcpy)
+ENDPROC(__memcpy)
/* Some CPUs run faster using the string copy instructions.
It is also a lot simpler. Use this when possible */
+ .section .altinstr_replacement,"ax"
+1: .byte 0xeb /* jmp <disp8> */
+ .byte (memcpy_c - memcpy) - (2f - 1b) /* offset */
+2:
+ .previous
.section .altinstructions,"a"
.align 8
- .quad memcpy
- .quad memcpy_c
- .byte X86_FEATURE_REP_GOOD
- .byte .Lfinal-memcpy
- .byte memcpy_c_end-memcpy_c
- .previous
-
- .section .altinstr_replacement,"ax"
- /* rdi destination
- * rsi source
- * rdx count
- */
-memcpy_c:
- movq %rdi,%rax
- movl %edx,%ecx
- shrl $3,%ecx
- andl $7,%edx
- rep
- movsq
- movl %edx,%ecx
- rep
- movsb
- ret
-memcpy_c_end:
+ .quad memcpy
+ .quad 1b
+ .byte X86_FEATURE_REP_GOOD
+ .byte .Lfinal - memcpy
+ .byte 2b - 1b
.previous
diff --git a/arch/x86_64/lib/memset.S b/arch/x86_64/lib/memset.S
index ad397f2..09ed1f6 100644
--- a/arch/x86_64/lib/memset.S
+++ b/arch/x86_64/lib/memset.S
@@ -1,4 +1,9 @@
/* Copyright 2002 Andi Kleen, SuSE Labs */
+
+#include <linux/config.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
/*
* ISO C memset - set a memory block to a byte value.
*
@@ -8,11 +13,29 @@
*
* rax original destination
*/
- .globl __memset
- .globl memset
- .p2align 4
-memset:
-__memset:
+ ALIGN
+memset_c:
+ CFI_STARTPROC
+ movq %rdi,%r9
+ movl %edx,%r8d
+ andl $7,%r8d
+ movl %edx,%ecx
+ shrl $3,%ecx
+ /* expand byte value */
+ movzbl %sil,%esi
+ movabs $0x0101010101010101,%rax
+ mulq %rsi /* with rax, clobbers rdx */
+ rep stosq
+ movl %r8d,%ecx
+ rep stosb
+ movq %r9,%rax
+ ret
+ CFI_ENDPROC
+ENDPROC(memset_c)
+
+ENTRY(memset)
+ENTRY(__memset)
+ CFI_STARTPROC
movq %rdi,%r10
movq %rdx,%r11
@@ -25,6 +48,7 @@
movl %edi,%r9d
andl $7,%r9d
jnz .Lbad_alignment
+ CFI_REMEMBER_STATE
.Lafter_bad_alignment:
movl %r11d,%ecx
@@ -75,6 +99,7 @@
movq %r10,%rax
ret
+ CFI_RESTORE_STATE
.Lbad_alignment:
cmpq $7,%r11
jbe .Lhandle_7
@@ -84,42 +109,26 @@
addq %r8,%rdi
subq %r8,%r11
jmp .Lafter_bad_alignment
+.Lfinal:
+ CFI_ENDPROC
+ENDPROC(memset)
+ENDPROC(__memset)
/* Some CPUs run faster using the string instructions.
It is also a lot simpler. Use this when possible */
#include <asm/cpufeature.h>
+ .section .altinstr_replacement,"ax"
+1: .byte 0xeb /* jmp <disp8> */
+ .byte (memset_c - memset) - (2f - 1b) /* offset */
+2:
+ .previous
.section .altinstructions,"a"
.align 8
- .quad memset
- .quad memset_c
- .byte X86_FEATURE_REP_GOOD
- .byte memset_c_end-memset_c
- .byte memset_c_end-memset_c
- .previous
-
- .section .altinstr_replacement,"ax"
- /* rdi destination
- * rsi value
- * rdx count
- */
-memset_c:
- movq %rdi,%r9
- movl %edx,%r8d
- andl $7,%r8d
- movl %edx,%ecx
- shrl $3,%ecx
- /* expand byte value */
- movzbl %sil,%esi
- movabs $0x0101010101010101,%rax
- mulq %rsi /* with rax, clobbers rdx */
- rep
- stosq
- movl %r8d,%ecx
- rep
- stosb
- movq %r9,%rax
- ret
-memset_c_end:
+ .quad memset
+ .quad 1b
+ .byte X86_FEATURE_REP_GOOD
+ .byte .Lfinal - memset
+ .byte 2b - 1b
.previous
diff --git a/arch/x86_64/lib/putuser.S b/arch/x86_64/lib/putuser.S
index 7f55939..4989f5a 100644
--- a/arch/x86_64/lib/putuser.S
+++ b/arch/x86_64/lib/putuser.S
@@ -25,25 +25,26 @@
*/
#include <linux/linkage.h>
+#include <asm/dwarf2.h>
#include <asm/page.h>
#include <asm/errno.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
.text
- .p2align 4
-.globl __put_user_1
-__put_user_1:
+ENTRY(__put_user_1)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
cmpq threadinfo_addr_limit(%r8),%rcx
jae bad_put_user
1: movb %dl,(%rcx)
xorl %eax,%eax
ret
+ CFI_ENDPROC
+ENDPROC(__put_user_1)
- .p2align 4
-.globl __put_user_2
-__put_user_2:
+ENTRY(__put_user_2)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $1,%rcx
jc 20f
@@ -55,10 +56,11 @@
ret
20: decq %rcx
jmp bad_put_user
+ CFI_ENDPROC
+ENDPROC(__put_user_2)
- .p2align 4
-.globl __put_user_4
-__put_user_4:
+ENTRY(__put_user_4)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $3,%rcx
jc 30f
@@ -70,10 +72,11 @@
ret
30: subq $3,%rcx
jmp bad_put_user
+ CFI_ENDPROC
+ENDPROC(__put_user_4)
- .p2align 4
-.globl __put_user_8
-__put_user_8:
+ENTRY(__put_user_8)
+ CFI_STARTPROC
GET_THREAD_INFO(%r8)
addq $7,%rcx
jc 40f
@@ -85,10 +88,15 @@
ret
40: subq $7,%rcx
jmp bad_put_user
+ CFI_ENDPROC
+ENDPROC(__put_user_8)
bad_put_user:
+ CFI_STARTPROC
movq $(-EFAULT),%rax
ret
+ CFI_ENDPROC
+END(bad_put_user)
.section __ex_table,"a"
.quad 1b,bad_put_user
diff --git a/arch/x86_64/lib/rwlock.S b/arch/x86_64/lib/rwlock.S
new file mode 100644
index 0000000..0cde1f8
--- /dev/null
+++ b/arch/x86_64/lib/rwlock.S
@@ -0,0 +1,38 @@
+/* Slow paths of read/write spinlocks. */
+
+#include <linux/linkage.h>
+#include <asm/rwlock.h>
+#include <asm/alternative-asm.i>
+#include <asm/dwarf2.h>
+
+/* rdi: pointer to rwlock_t */
+ENTRY(__write_lock_failed)
+ CFI_STARTPROC
+ LOCK_PREFIX
+ addl $RW_LOCK_BIAS,(%rdi)
+1: rep
+ nop
+ cmpl $RW_LOCK_BIAS,(%rdi)
+ jne 1b
+ LOCK_PREFIX
+ subl $RW_LOCK_BIAS,(%rdi)
+ jnz __write_lock_failed
+ ret
+ CFI_ENDPROC
+END(__write_lock_failed)
+
+/* rdi: pointer to rwlock_t */
+ENTRY(__read_lock_failed)
+ CFI_STARTPROC
+ LOCK_PREFIX
+ incl (%rdi)
+1: rep
+ nop
+ cmpl $1,(%rdi)
+ js 1b
+ LOCK_PREFIX
+ decl (%rdi)
+ js __read_lock_failed
+ ret
+ CFI_ENDPROC
+END(__read_lock_failed)
diff --git a/arch/x86_64/lib/thunk.S b/arch/x86_64/lib/thunk.S
index 332ea5d..0025535 100644
--- a/arch/x86_64/lib/thunk.S
+++ b/arch/x86_64/lib/thunk.S
@@ -1,10 +1,9 @@
- /*
- * Save registers before calling assembly functions. This avoids
- * disturbance of register allocation in some inline assembly constructs.
- * Copyright 2001,2002 by Andi Kleen, SuSE Labs.
- * Subject to the GNU public license, v.2. No warranty of any kind.
- * $Id: thunk.S,v 1.2 2002/03/13 20:06:58 ak Exp $
- */
+/*
+ * Save registers before calling assembly functions. This avoids
+ * disturbance of register allocation in some inline assembly constructs.
+ * Copyright 2001,2002 by Andi Kleen, SuSE Labs.
+ * Subject to the GNU public license, v.2. No warranty of any kind.
+ */
#include <linux/config.h>
#include <linux/linkage.h>
@@ -67,33 +66,3 @@
RESTORE_ARGS 1
ret
CFI_ENDPROC
-
-#ifdef CONFIG_SMP
-/* Support for read/write spinlocks. */
- .text
-/* rax: pointer to rwlock_t */
-ENTRY(__write_lock_failed)
- lock
- addl $RW_LOCK_BIAS,(%rax)
-1: rep
- nop
- cmpl $RW_LOCK_BIAS,(%rax)
- jne 1b
- lock
- subl $RW_LOCK_BIAS,(%rax)
- jnz __write_lock_failed
- ret
-
-/* rax: pointer to rwlock_t */
-ENTRY(__read_lock_failed)
- lock
- incl (%rax)
-1: rep
- nop
- cmpl $1,(%rax)
- js 1b
- lock
- decl (%rax)
- js __read_lock_failed
- ret
-#endif
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 4198798..1a17b07 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -40,8 +40,7 @@
#define PF_RSVD (1<<3)
#define PF_INSTR (1<<4)
-#ifdef CONFIG_KPROBES
-ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain);
+static ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain);
/* Hook to register for page fault notifications */
int register_page_fault_notifier(struct notifier_block *nb)
@@ -49,11 +48,13 @@
vmalloc_sync_all();
return atomic_notifier_chain_register(¬ify_page_fault_chain, nb);
}
+EXPORT_SYMBOL_GPL(register_page_fault_notifier);
int unregister_page_fault_notifier(struct notifier_block *nb)
{
return atomic_notifier_chain_unregister(¬ify_page_fault_chain, nb);
}
+EXPORT_SYMBOL_GPL(unregister_page_fault_notifier);
static inline int notify_page_fault(enum die_val val, const char *str,
struct pt_regs *regs, long err, int trap, int sig)
@@ -67,13 +68,6 @@
};
return atomic_notifier_call_chain(¬ify_page_fault_chain, val, &args);
}
-#else
-static inline int notify_page_fault(enum die_val val, const char *str,
- struct pt_regs *regs, long err, int trap, int sig)
-{
- return NOTIFY_DONE;
-}
-#endif
void bust_spinlocks(int yes)
{
@@ -102,7 +96,7 @@
static noinline int is_prefetch(struct pt_regs *regs, unsigned long addr,
unsigned long error_code)
{
- unsigned char *instr;
+ unsigned char __user *instr;
int scan_more = 1;
int prefetch = 0;
unsigned char *max_instr;
@@ -111,7 +105,7 @@
if (error_code & PF_INSTR)
return 0;
- instr = (unsigned char *)convert_rip_to_linear(current, regs);
+ instr = (unsigned char __user *)convert_rip_to_linear(current, regs);
max_instr = instr + 15;
if (user_mode(regs) && instr >= (unsigned char *)TASK_SIZE)
@@ -122,7 +116,7 @@
unsigned char instr_hi;
unsigned char instr_lo;
- if (__get_user(opcode, instr))
+ if (__get_user(opcode, (char __user *)instr))
break;
instr_hi = opcode & 0xf0;
@@ -160,7 +154,7 @@
case 0x00:
/* Prefetch instruction is 0x0F0D or 0x0F18 */
scan_more = 0;
- if (__get_user(opcode, instr))
+ if (__get_user(opcode, (char __user *)instr))
break;
prefetch = (instr_lo == 0xF) &&
(opcode == 0x0D || opcode == 0x18);
@@ -176,7 +170,7 @@
static int bad_address(void *p)
{
unsigned long dummy;
- return __get_user(dummy, (unsigned long *)p);
+ return __get_user(dummy, (unsigned long __user *)p);
}
void dump_pagetable(unsigned long address)
diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c
index 52fd42c..1e4669f 100644
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@@ -229,7 +229,6 @@
/* actually usually some more */
if (size >= LARGE_PAGE_SIZE) {
- printk("SMBIOS area too long %lu\n", size);
return NULL;
}
set_pmd(temp_mappings[0].pmd, __pmd(map | _KERNPG_TABLE | _PAGE_PSE));
@@ -250,12 +249,13 @@
}
static void __meminit
-phys_pmd_init(pmd_t *pmd, unsigned long address, unsigned long end)
+phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
{
- int i;
+ int i = pmd_index(address);
- for (i = 0; i < PTRS_PER_PMD; pmd++, i++, address += PMD_SIZE) {
+ for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
unsigned long entry;
+ pmd_t *pmd = pmd_page + pmd_index(address);
if (address >= end) {
if (!after_bootmem)
@@ -263,6 +263,10 @@
set_pmd(pmd, __pmd(0));
break;
}
+
+ if (pmd_val(*pmd))
+ continue;
+
entry = _PAGE_NX|_PAGE_PSE|_KERNPG_TABLE|_PAGE_GLOBAL|address;
entry &= __supported_pte_mask;
set_pmd(pmd, __pmd(entry));
@@ -272,45 +276,41 @@
static void __meminit
phys_pmd_update(pud_t *pud, unsigned long address, unsigned long end)
{
- pmd_t *pmd = pmd_offset(pud, (unsigned long)__va(address));
-
- if (pmd_none(*pmd)) {
- spin_lock(&init_mm.page_table_lock);
- phys_pmd_init(pmd, address, end);
- spin_unlock(&init_mm.page_table_lock);
- __flush_tlb_all();
- }
+ pmd_t *pmd = pmd_offset(pud,0);
+ spin_lock(&init_mm.page_table_lock);
+ phys_pmd_init(pmd, address, end);
+ spin_unlock(&init_mm.page_table_lock);
+ __flush_tlb_all();
}
-static void __meminit phys_pud_init(pud_t *pud, unsigned long address, unsigned long end)
+static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
{
- long i = pud_index(address);
+ int i = pud_index(addr);
- pud = pud + i;
- if (after_bootmem && pud_val(*pud)) {
- phys_pmd_update(pud, address, end);
- return;
- }
-
- for (; i < PTRS_PER_PUD; pud++, i++) {
+ for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE ) {
int map;
- unsigned long paddr, pmd_phys;
+ unsigned long pmd_phys;
+ pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
- paddr = (address & PGDIR_MASK) + i*PUD_SIZE;
- if (paddr >= end)
+ if (addr >= end)
break;
- if (!after_bootmem && !e820_any_mapped(paddr, paddr+PUD_SIZE, 0)) {
+ if (!after_bootmem && !e820_any_mapped(addr,addr+PUD_SIZE,0)) {
set_pud(pud, __pud(0));
continue;
}
+ if (pud_val(*pud)) {
+ phys_pmd_update(pud, addr, end);
+ continue;
+ }
+
pmd = alloc_low_page(&map, &pmd_phys);
spin_lock(&init_mm.page_table_lock);
set_pud(pud, __pud(pmd_phys | _KERNPG_TABLE));
- phys_pmd_init(pmd, paddr, end);
+ phys_pmd_init(pmd, addr, end);
spin_unlock(&init_mm.page_table_lock);
unmap_low_page(map);
}
@@ -597,12 +597,6 @@
pci_iommu_alloc();
- /* How many end-of-memory variables you have, grandma! */
- max_low_pfn = end_pfn;
- max_pfn = end_pfn;
- num_physpages = end_pfn;
- high_memory = (void *) __va(end_pfn * PAGE_SIZE);
-
/* clear the zero-page */
memset(empty_zero_page, 0, PAGE_SIZE);
diff --git a/arch/x86_64/mm/k8topology.c b/arch/x86_64/mm/k8topology.c
index 7c45c2d..5cf594f 100644
--- a/arch/x86_64/mm/k8topology.c
+++ b/arch/x86_64/mm/k8topology.c
@@ -54,6 +54,9 @@
nodes_clear(nodes_parsed);
+ if (!early_pci_allowed())
+ return -1;
+
nb = find_northbridge();
if (nb < 0)
return nb;
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index b2fac14..322bf45 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -225,7 +225,7 @@
int numa_fake __initdata = 0;
/* Numa emulation */
-static int numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
+static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
{
int i;
struct bootnode nodes[MAX_NUMNODES];
@@ -348,9 +348,10 @@
}
}
-/* [numa=off] */
-__init int numa_setup(char *opt)
+static __init int numa_setup(char *opt)
{
+ if (!opt)
+ return -EINVAL;
if (!strncmp(opt,"off",3))
numa_off = 1;
#ifdef CONFIG_NUMA_EMU
@@ -366,9 +367,11 @@
if (!strncmp(opt,"hotadd=", 7))
hotadd_percent = simple_strtoul(opt+7, NULL, 10);
#endif
- return 1;
+ return 0;
}
+early_param("numa", numa_setup);
+
/*
* Setup early cpu_to_node.
*
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 2685b1f..3e231d76 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -108,8 +108,8 @@
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, address);
BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
- pgprot_val(ref_prot) |= _PAGE_PSE;
large_pte = mk_pte_phys(__pa(address) & LARGE_PAGE_MASK, ref_prot);
+ large_pte = pte_mkhuge(large_pte);
set_pte((pte_t *)pmd, large_pte);
}
@@ -119,32 +119,28 @@
{
pte_t *kpte;
struct page *kpte_page;
- unsigned kpte_flags;
pgprot_t ref_prot2;
kpte = lookup_address(address);
if (!kpte) return 0;
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
- kpte_flags = pte_val(*kpte);
if (pgprot_val(prot) != pgprot_val(ref_prot)) {
- if ((kpte_flags & _PAGE_PSE) == 0) {
+ if (!pte_huge(*kpte)) {
set_pte(kpte, pfn_pte(pfn, prot));
} else {
/*
* split_large_page will take the reference for this
* change_page_attr on the split page.
*/
-
struct page *split;
- ref_prot2 = __pgprot(pgprot_val(pte_pgprot(*lookup_address(address))) & ~(1<<_PAGE_BIT_PSE));
-
+ ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
split = split_large_page(address, prot, ref_prot2);
if (!split)
return -ENOMEM;
- set_pte(kpte,mk_pte(split, ref_prot2));
+ set_pte(kpte, mk_pte(split, ref_prot2));
kpte_page = split;
- }
+ }
page_private(kpte_page)++;
- } else if ((kpte_flags & _PAGE_PSE) == 0) {
+ } else if (!pte_huge(*kpte)) {
set_pte(kpte, pfn_pte(pfn, ref_prot));
BUG_ON(page_private(kpte_page) == 0);
page_private(kpte_page)--;
@@ -190,10 +186,12 @@
* lowmem */
if (__pa(address) < KERNEL_TEXT_SIZE) {
unsigned long addr2;
- pgprot_t prot2 = prot;
+ pgprot_t prot2;
addr2 = __START_KERNEL_map + __pa(address);
- pgprot_val(prot2) &= ~_PAGE_NX;
- err = __change_page_attr(addr2, pfn, prot2, PAGE_KERNEL_EXEC);
+ /* Make sure the kernel mappings stay executable */
+ prot2 = pte_pgprot(pte_mkexec(pfn_pte(0, prot)));
+ err = __change_page_attr(addr2, pfn, prot2,
+ PAGE_KERNEL_EXEC);
}
}
up_write(&init_mm.mmap_sem);
diff --git a/arch/x86_64/mm/srat.c b/arch/x86_64/mm/srat.c
index 502fce6..ca10701 100644
--- a/arch/x86_64/mm/srat.c
+++ b/arch/x86_64/mm/srat.c
@@ -21,6 +21,8 @@
#include <asm/numa.h>
#include <asm/e820.h>
+int acpi_numa __initdata;
+
#if (defined(CONFIG_ACPI_HOTPLUG_MEMORY) || \
defined(CONFIG_ACPI_HOTPLUG_MEMORY_MODULE)) \
&& !defined(CONFIG_MEMORY_HOTPLUG)
diff --git a/arch/x86_64/pci/Makefile b/arch/x86_64/pci/Makefile
index a3f6ad5..1eb18f4 100644
--- a/arch/x86_64/pci/Makefile
+++ b/arch/x86_64/pci/Makefile
@@ -9,7 +9,7 @@
obj-$(CONFIG_PCI_DIRECT)+= direct.o
obj-y += fixup.o init.o
obj-$(CONFIG_ACPI) += acpi.o
-obj-y += legacy.o irq.o common.o
+obj-y += legacy.o irq.o common.o early.o
# mmconfig has a 64bit special
obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o direct.o
@@ -23,3 +23,4 @@
fixup-y += ../../i386/pci/fixup.o
i386-y += ../../i386/pci/i386.o
init-y += ../../i386/pci/init.o
+early-y += ../../i386/pci/early.o
diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index 3c55c76..7732f42 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -156,15 +156,45 @@
addr = pci_dev_base(0, k, PCI_DEVFN(i, 0));
if (addr == NULL|| readl(addr) != val1) {
set_bit(i + 32*k, fallback_slots);
- printk(KERN_NOTICE
- "PCI: No mmconfig possible on device %x:%x\n",
- k, i);
+ printk(KERN_NOTICE "PCI: No mmconfig possible"
+ " on device %02x:%02x\n", k, i);
}
}
}
}
-void __init pci_mmcfg_init(void)
+static __init void pci_mmcfg_insert_resources(void)
+{
+#define PCI_MMCFG_RESOURCE_NAME_LEN 19
+ int i;
+ struct resource *res;
+ char *names;
+ unsigned num_buses;
+
+ res = kcalloc(PCI_MMCFG_RESOURCE_NAME_LEN + sizeof(*res),
+ pci_mmcfg_config_num, GFP_KERNEL);
+
+ if (!res) {
+ printk(KERN_ERR "PCI: Unable to allocate MMCONFIG resources\n");
+ return;
+ }
+
+ names = (void *)&res[pci_mmcfg_config_num];
+ for (i = 0; i < pci_mmcfg_config_num; i++, res++) {
+ num_buses = pci_mmcfg_config[i].end_bus_number -
+ pci_mmcfg_config[i].start_bus_number + 1;
+ res->name = names;
+ snprintf(names, PCI_MMCFG_RESOURCE_NAME_LEN, "PCI MMCONFIG %u",
+ pci_mmcfg_config[i].pci_segment_group_number);
+ res->start = pci_mmcfg_config[i].base_address;
+ res->end = res->start + (num_buses << 20) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ insert_resource(&iomem_resource, res);
+ names += PCI_MMCFG_RESOURCE_NAME_LEN;
+ }
+}
+
+void __init pci_mmcfg_init(int type)
{
int i;
@@ -177,7 +207,9 @@
(pci_mmcfg_config[0].base_address == 0))
return;
- if (!e820_all_mapped(pci_mmcfg_config[0].base_address,
+ /* Only do this check when type 1 works. If it doesn't work
+ assume we run on a Mac and always use MCFG */
+ if (type == 1 && !e820_all_mapped(pci_mmcfg_config[0].base_address,
pci_mmcfg_config[0].base_address + MMCONFIG_APER_MIN,
E820_RESERVED)) {
printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %x is not E820-reserved\n",
@@ -186,7 +218,6 @@
return;
}
- /* RED-PEN i386 doesn't do _nocache right now */
pci_mmcfg_virt = kmalloc(sizeof(*pci_mmcfg_virt) * pci_mmcfg_config_num, GFP_KERNEL);
if (pci_mmcfg_virt == NULL) {
printk("PCI: Can not allocate memory for mmconfig structures\n");
@@ -205,6 +236,7 @@
}
unreachable_devices();
+ pci_mmcfg_insert_resources();
raw_pci_ops = &pci_mmcfg;
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 8afba33..58b0eb5 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -868,8 +868,8 @@
do_div(temp, period);
hpetp->hp_tick_freq = temp; /* ticks per second */
- printk(KERN_INFO "hpet%d: at MMIO 0x%lx (virtual 0x%p), IRQ%s",
- hpetp->hp_which, hdp->hd_phys_address, hdp->hd_address,
+ printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
+ hpetp->hp_which, hdp->hd_phys_address,
hpetp->hp_ntimer > 1 ? "s" : "");
for (i = 0; i < hpetp->hp_ntimer; i++)
printk("%s %d", i > 0 ? "," : "", hdp->hd_irq[i]);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8ab0278..590f4e6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -955,13 +955,12 @@
}
str = k;
}
- return 1;
+ return 0;
}
+early_param("pci", pci_setup);
device_initcall(pci_init);
-__setup("pci=", pci_setup);
-
#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
/* FIXME: Some boxes have multiple ISA bridges! */
struct pci_dev *isa_bridge;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 64802aa..dfd8cfb 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -515,7 +515,8 @@
{
unsigned int random_variable = 0;
- if (current->flags & PF_RANDOMIZE) {
+ if ((current->flags & PF_RANDOMIZE) &&
+ !(current->personality & ADDR_NO_RANDOMIZE)) {
random_variable = get_random_int() & STACK_RND_MASK;
random_variable <<= PAGE_SHIFT;
}
diff --git a/fs/compat.c b/fs/compat.c
index e31e9cf..ce982f6 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1855,7 +1855,7 @@
} while (!ret && !timeout && tsp && (ts.tv_sec || ts.tv_nsec));
- if (tsp && !(current->personality & STICKY_TIMEOUTS)) {
+ if (ret == 0 && tsp && !(current->personality & STICKY_TIMEOUTS)) {
struct compat_timespec rts;
rts.tv_sec = timeout / HZ;
@@ -1866,7 +1866,8 @@
}
if (compat_timespec_compare(&rts, &ts) >= 0)
rts = ts;
- copy_to_user(tsp, &rts, sizeof(rts));
+ if (copy_to_user(tsp, &rts, sizeof(rts)))
+ ret = -EFAULT;
}
if (ret == -ERESTARTNOHAND) {
diff --git a/include/asm-i386/acpi.h b/include/asm-i386/acpi.h
index 20f5239..6016632 100644
--- a/include/asm-i386/acpi.h
+++ b/include/asm-i386/acpi.h
@@ -131,21 +131,7 @@
extern int acpi_gsi_to_irq(u32 gsi, unsigned int *irq);
#ifdef CONFIG_X86_IO_APIC
-extern int skip_ioapic_setup;
extern int acpi_skip_timer_override;
-
-static inline void disable_ioapic_setup(void)
-{
- skip_ioapic_setup = 1;
-}
-
-static inline int ioapic_setup_disabled(void)
-{
- return skip_ioapic_setup;
-}
-
-#else
-static inline void disable_ioapic_setup(void) { }
#endif
static inline void acpi_noirq_set(void) { acpi_noirq = 1; }
diff --git a/include/asm-i386/alternative-asm.i b/include/asm-i386/alternative-asm.i
new file mode 100644
index 0000000..6c47e3b
--- /dev/null
+++ b/include/asm-i386/alternative-asm.i
@@ -0,0 +1,14 @@
+#include <linux/config.h>
+
+#ifdef CONFIG_SMP
+ .macro LOCK_PREFIX
+1: lock
+ .section .smp_locks,"a"
+ .align 4
+ .long 1b
+ .previous
+ .endm
+#else
+ .macro LOCK_PREFIX
+ .endm
+#endif
diff --git a/include/asm-i386/apic.h b/include/asm-i386/apic.h
index 2c1e371..3a42b7d 100644
--- a/include/asm-i386/apic.h
+++ b/include/asm-i386/apic.h
@@ -16,20 +16,8 @@
#define APIC_VERBOSE 1
#define APIC_DEBUG 2
-extern int enable_local_apic;
extern int apic_verbosity;
-static inline void lapic_disable(void)
-{
- enable_local_apic = -1;
- clear_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability);
-}
-
-static inline void lapic_enable(void)
-{
- enable_local_apic = 1;
-}
-
/*
* Define the default level of output to be very little
* This can be turned up by using apic=verbose for more
@@ -42,6 +30,8 @@
} while (0)
+extern void generic_apic_probe(void);
+
#ifdef CONFIG_X86_LOCAL_APIC
/*
@@ -117,8 +107,6 @@
extern void enable_NMI_through_LVT0 (void * dummy);
-extern int disable_timer_pin_1;
-
void smp_send_timer_broadcast_ipi(struct pt_regs *regs);
void switch_APIC_timer_to_ipi(void *cpumask);
void switch_ipi_to_APIC_timer(void *cpumask);
diff --git a/include/asm-i386/desc.h b/include/asm-i386/desc.h
index 89b8b82..5874ef1 100644
--- a/include/asm-i386/desc.h
+++ b/include/asm-i386/desc.h
@@ -33,50 +33,99 @@
return (struct desc_struct *)per_cpu(cpu_gdt_descr, cpu).address;
}
-#define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8))
-#define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8))
-
-#define load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr))
-#define load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr))
-#define load_tr(tr) __asm__ __volatile("ltr %0"::"mr" (tr))
-#define load_ldt(ldt) __asm__ __volatile("lldt %0"::"mr" (ldt))
-
-#define store_gdt(dtr) __asm__ ("sgdt %0":"=m" (*dtr))
-#define store_idt(dtr) __asm__ ("sidt %0":"=m" (*dtr))
-#define store_tr(tr) __asm__ ("str %0":"=mr" (tr))
-#define store_ldt(ldt) __asm__ ("sldt %0":"=mr" (ldt))
-
/*
* This is the ldt that every process will get unless we need
* something other than this.
*/
extern struct desc_struct default_ldt[];
+extern struct desc_struct idt_table[];
extern void set_intr_gate(unsigned int irq, void * addr);
-#define _set_tssldt_desc(n,addr,limit,type) \
-__asm__ __volatile__ ("movw %w3,0(%2)\n\t" \
- "movw %w1,2(%2)\n\t" \
- "rorl $16,%1\n\t" \
- "movb %b1,4(%2)\n\t" \
- "movb %4,5(%2)\n\t" \
- "movb $0,6(%2)\n\t" \
- "movb %h1,7(%2)\n\t" \
- "rorl $16,%1" \
- : "=m"(*(n)) : "q" (addr), "r"(n), "ir"(limit), "i"(type))
-
-static inline void __set_tss_desc(unsigned int cpu, unsigned int entry, void *addr)
+static inline void pack_descriptor(__u32 *a, __u32 *b,
+ unsigned long base, unsigned long limit, unsigned char type, unsigned char flags)
{
- _set_tssldt_desc(&get_cpu_gdt_table(cpu)[entry], (int)addr,
- offsetof(struct tss_struct, __cacheline_filler) - 1, 0x89);
+ *a = ((base & 0xffff) << 16) | (limit & 0xffff);
+ *b = (base & 0xff000000) | ((base & 0xff0000) >> 16) |
+ (limit & 0x000f0000) | ((type & 0xff) << 8) | ((flags & 0xf) << 20);
+}
+
+static inline void pack_gate(__u32 *a, __u32 *b,
+ unsigned long base, unsigned short seg, unsigned char type, unsigned char flags)
+{
+ *a = (seg << 16) | (base & 0xffff);
+ *b = (base & 0xffff0000) | ((type & 0xff) << 8) | (flags & 0xff);
+}
+
+#define DESCTYPE_LDT 0x82 /* present, system, DPL-0, LDT */
+#define DESCTYPE_TSS 0x89 /* present, system, DPL-0, 32-bit TSS */
+#define DESCTYPE_TASK 0x85 /* present, system, DPL-0, task gate */
+#define DESCTYPE_INT 0x8e /* present, system, DPL-0, interrupt gate */
+#define DESCTYPE_TRAP 0x8f /* present, system, DPL-0, trap gate */
+#define DESCTYPE_DPL3 0x60 /* DPL-3 */
+#define DESCTYPE_S 0x10 /* !system */
+
+#define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8))
+#define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8))
+
+#define load_gdt(dtr) __asm__ __volatile("lgdt %0"::"m" (*dtr))
+#define load_idt(dtr) __asm__ __volatile("lidt %0"::"m" (*dtr))
+#define load_tr(tr) __asm__ __volatile("ltr %0"::"m" (tr))
+#define load_ldt(ldt) __asm__ __volatile("lldt %0"::"m" (ldt))
+
+#define store_gdt(dtr) __asm__ ("sgdt %0":"=m" (*dtr))
+#define store_idt(dtr) __asm__ ("sidt %0":"=m" (*dtr))
+#define store_tr(tr) __asm__ ("str %0":"=m" (tr))
+#define store_ldt(ldt) __asm__ ("sldt %0":"=m" (ldt))
+
+#if TLS_SIZE != 24
+# error update this code.
+#endif
+
+static inline void load_TLS(struct thread_struct *t, unsigned int cpu)
+{
+#define C(i) get_cpu_gdt_table(cpu)[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i]
+ C(0); C(1); C(2);
+#undef C
+}
+
+static inline void write_dt_entry(void *dt, int entry, __u32 entry_a, __u32 entry_b)
+{
+ __u32 *lp = (__u32 *)((char *)dt + entry*8);
+ *lp = entry_a;
+ *(lp+1) = entry_b;
+}
+
+#define write_ldt_entry(dt, entry, a, b) write_dt_entry(dt, entry, a, b)
+#define write_gdt_entry(dt, entry, a, b) write_dt_entry(dt, entry, a, b)
+#define write_idt_entry(dt, entry, a, b) write_dt_entry(dt, entry, a, b)
+
+static inline void _set_gate(int gate, unsigned int type, void *addr, unsigned short seg)
+{
+ __u32 a, b;
+ pack_gate(&a, &b, (unsigned long)addr, seg, type, 0);
+ write_idt_entry(idt_table, gate, a, b);
+}
+
+static inline void __set_tss_desc(unsigned int cpu, unsigned int entry, const void *addr)
+{
+ __u32 a, b;
+ pack_descriptor(&a, &b, (unsigned long)addr,
+ offsetof(struct tss_struct, __cacheline_filler) - 1,
+ DESCTYPE_TSS, 0);
+ write_gdt_entry(get_cpu_gdt_table(cpu), entry, a, b);
+}
+
+static inline void set_ldt_desc(unsigned int cpu, void *addr, unsigned int entries)
+{
+ __u32 a, b;
+ pack_descriptor(&a, &b, (unsigned long)addr,
+ entries * sizeof(struct desc_struct) - 1,
+ DESCTYPE_LDT, 0);
+ write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_LDT, a, b);
}
#define set_tss_desc(cpu,addr) __set_tss_desc(cpu, GDT_ENTRY_TSS, addr)
-static inline void set_ldt_desc(unsigned int cpu, void *addr, unsigned int size)
-{
- _set_tssldt_desc(&get_cpu_gdt_table(cpu)[GDT_ENTRY_LDT], (int)addr, ((size << 3)-1), 0x82);
-}
-
#define LDT_entry_a(info) \
((((info)->base_addr & 0x0000ffff) << 16) | ((info)->limit & 0x0ffff))
@@ -102,24 +151,6 @@
(info)->seg_not_present == 1 && \
(info)->useable == 0 )
-static inline void write_ldt_entry(void *ldt, int entry, __u32 entry_a, __u32 entry_b)
-{
- __u32 *lp = (__u32 *)((char *)ldt + entry*8);
- *lp = entry_a;
- *(lp+1) = entry_b;
-}
-
-#if TLS_SIZE != 24
-# error update this code.
-#endif
-
-static inline void load_TLS(struct thread_struct *t, unsigned int cpu)
-{
-#define C(i) get_cpu_gdt_table(cpu)[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i]
- C(0); C(1); C(2);
-#undef C
-}
-
static inline void clear_LDT(void)
{
int cpu = get_cpu();
diff --git a/include/asm-i386/dwarf2.h b/include/asm-i386/dwarf2.h
index 2280f62..6d66398 100644
--- a/include/asm-i386/dwarf2.h
+++ b/include/asm-i386/dwarf2.h
@@ -1,8 +1,6 @@
#ifndef _DWARF2_H
#define _DWARF2_H
-#include <linux/config.h>
-
#ifndef __ASSEMBLY__
#warning "asm/dwarf2.h should be only included in pure assembly files"
#endif
@@ -28,6 +26,13 @@
#define CFI_RESTORE .cfi_restore
#define CFI_REMEMBER_STATE .cfi_remember_state
#define CFI_RESTORE_STATE .cfi_restore_state
+#define CFI_UNDEFINED .cfi_undefined
+
+#ifdef CONFIG_AS_CFI_SIGNAL_FRAME
+#define CFI_SIGNAL_FRAME .cfi_signal_frame
+#else
+#define CFI_SIGNAL_FRAME
+#endif
#else
@@ -48,6 +53,8 @@
#define CFI_RESTORE ignore
#define CFI_REMEMBER_STATE ignore
#define CFI_RESTORE_STATE ignore
+#define CFI_UNDEFINED ignore
+#define CFI_SIGNAL_FRAME ignore
#endif
diff --git a/include/asm-i386/e820.h b/include/asm-i386/e820.h
index ca82acb..f7514fb 100644
--- a/include/asm-i386/e820.h
+++ b/include/asm-i386/e820.h
@@ -18,7 +18,7 @@
#define E820_RAM 1
#define E820_RESERVED 2
-#define E820_ACPI 3 /* usable as RAM once ACPI tables have been read */
+#define E820_ACPI 3
#define E820_NVS 4
#define HIGH_MEMORY (1024*1024)
diff --git a/include/asm-i386/frame.i b/include/asm-i386/frame.i
new file mode 100644
index 0000000..4d68ddc
--- /dev/null
+++ b/include/asm-i386/frame.i
@@ -0,0 +1,24 @@
+#include <linux/config.h>
+#include <asm/dwarf2.h>
+
+/* The annotation hides the frame from the unwinder and makes it look
+ like a ordinary ebp save/restore. This avoids some special cases for
+ frame pointer later */
+#ifdef CONFIG_FRAME_POINTER
+ .macro FRAME
+ pushl %ebp
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ebp,0
+ movl %esp,%ebp
+ .endm
+ .macro ENDFRAME
+ popl %ebp
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ebp
+ .endm
+#else
+ .macro FRAME
+ .endm
+ .macro ENDFRAME
+ .endm
+#endif
diff --git a/include/asm-i386/genapic.h b/include/asm-i386/genapic.h
index b3783a3..8ffbb0f 100644
--- a/include/asm-i386/genapic.h
+++ b/include/asm-i386/genapic.h
@@ -1,6 +1,8 @@
#ifndef _ASM_GENAPIC_H
#define _ASM_GENAPIC_H 1
+#include <asm/mpspec.h>
+
/*
* Generic APIC driver interface.
*
@@ -63,14 +65,25 @@
unsigned (*get_apic_id)(unsigned long x);
unsigned long apic_id_mask;
unsigned int (*cpu_mask_to_apicid)(cpumask_t cpumask);
-
+
+#ifdef CONFIG_SMP
/* ipi */
void (*send_IPI_mask)(cpumask_t mask, int vector);
void (*send_IPI_allbutself)(int vector);
void (*send_IPI_all)(int vector);
+#endif
};
-#define APICFUNC(x) .x = x
+#define APICFUNC(x) .x = x,
+
+/* More functions could be probably marked IPIFUNC and save some space
+ in UP GENERICARCH kernels, but I don't have the nerve right now
+ to untangle this mess. -AK */
+#ifdef CONFIG_SMP
+#define IPIFUNC(x) APICFUNC(x)
+#else
+#define IPIFUNC(x)
+#endif
#define APIC_INIT(aname, aprobe) { \
.name = aname, \
@@ -80,33 +93,33 @@
.no_balance_irq = NO_BALANCE_IRQ, \
.ESR_DISABLE = esr_disable, \
.apic_destination_logical = APIC_DEST_LOGICAL, \
- APICFUNC(apic_id_registered), \
- APICFUNC(target_cpus), \
- APICFUNC(check_apicid_used), \
- APICFUNC(check_apicid_present), \
- APICFUNC(init_apic_ldr), \
- APICFUNC(ioapic_phys_id_map), \
- APICFUNC(clustered_apic_check), \
- APICFUNC(multi_timer_check), \
- APICFUNC(apicid_to_node), \
- APICFUNC(cpu_to_logical_apicid), \
- APICFUNC(cpu_present_to_apicid), \
- APICFUNC(apicid_to_cpu_present), \
- APICFUNC(mpc_apic_id), \
- APICFUNC(setup_portio_remap), \
- APICFUNC(check_phys_apicid_present), \
- APICFUNC(mpc_oem_bus_info), \
- APICFUNC(mpc_oem_pci_bus), \
- APICFUNC(mps_oem_check), \
- APICFUNC(get_apic_id), \
+ APICFUNC(apic_id_registered) \
+ APICFUNC(target_cpus) \
+ APICFUNC(check_apicid_used) \
+ APICFUNC(check_apicid_present) \
+ APICFUNC(init_apic_ldr) \
+ APICFUNC(ioapic_phys_id_map) \
+ APICFUNC(clustered_apic_check) \
+ APICFUNC(multi_timer_check) \
+ APICFUNC(apicid_to_node) \
+ APICFUNC(cpu_to_logical_apicid) \
+ APICFUNC(cpu_present_to_apicid) \
+ APICFUNC(apicid_to_cpu_present) \
+ APICFUNC(mpc_apic_id) \
+ APICFUNC(setup_portio_remap) \
+ APICFUNC(check_phys_apicid_present) \
+ APICFUNC(mpc_oem_bus_info) \
+ APICFUNC(mpc_oem_pci_bus) \
+ APICFUNC(mps_oem_check) \
+ APICFUNC(get_apic_id) \
.apic_id_mask = APIC_ID_MASK, \
- APICFUNC(cpu_mask_to_apicid), \
- APICFUNC(acpi_madt_oem_check), \
- APICFUNC(send_IPI_mask), \
- APICFUNC(send_IPI_allbutself), \
- APICFUNC(send_IPI_all), \
- APICFUNC(enable_apic_mode), \
- APICFUNC(phys_pkg_id), \
+ APICFUNC(cpu_mask_to_apicid) \
+ APICFUNC(acpi_madt_oem_check) \
+ IPIFUNC(send_IPI_mask) \
+ IPIFUNC(send_IPI_allbutself) \
+ IPIFUNC(send_IPI_all) \
+ APICFUNC(enable_apic_mode) \
+ APICFUNC(phys_pkg_id) \
}
extern struct genapic *genapic;
diff --git a/include/asm-i386/intel_arch_perfmon.h b/include/asm-i386/intel_arch_perfmon.h
index 134ea9c..b52cd60 100644
--- a/include/asm-i386/intel_arch_perfmon.h
+++ b/include/asm-i386/intel_arch_perfmon.h
@@ -14,6 +14,18 @@
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL (0x3c)
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT (1 << 0)
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX (0)
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
+ (1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+};
#endif /* X86_INTEL_ARCH_PERFMON_H */
diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h
index 5092e81..5d30927 100644
--- a/include/asm-i386/io_apic.h
+++ b/include/asm-i386/io_apic.h
@@ -188,6 +188,16 @@
/* 1 if "noapic" boot option passed */
extern int skip_ioapic_setup;
+static inline void disable_ioapic_setup(void)
+{
+ skip_ioapic_setup = 1;
+}
+
+static inline int ioapic_setup_disabled(void)
+{
+ return skip_ioapic_setup;
+}
+
/*
* If we use the IO-APIC for IRQ routing, disable automatic
* assignment of PCI IRQ's.
@@ -206,6 +216,7 @@
#else /* !CONFIG_X86_IO_APIC */
#define io_apic_assign_pci_irqs 0
+static inline void disable_ioapic_setup(void) { }
#endif
extern int assign_irq_vector(int irq);
diff --git a/include/asm-i386/kexec.h b/include/asm-i386/kexec.h
index 53f0e06..4dfc9f5 100644
--- a/include/asm-i386/kexec.h
+++ b/include/asm-i386/kexec.h
@@ -1,6 +1,26 @@
#ifndef _I386_KEXEC_H
#define _I386_KEXEC_H
+#define PA_CONTROL_PAGE 0
+#define VA_CONTROL_PAGE 1
+#define PA_PGD 2
+#define VA_PGD 3
+#define PA_PTE_0 4
+#define VA_PTE_0 5
+#define PA_PTE_1 6
+#define VA_PTE_1 7
+#ifdef CONFIG_X86_PAE
+#define PA_PMD_0 8
+#define VA_PMD_0 9
+#define PA_PMD_1 10
+#define VA_PMD_1 11
+#define PAGES_NR 12
+#else
+#define PAGES_NR 8
+#endif
+
+#ifndef __ASSEMBLY__
+
#include <asm/fixmap.h>
#include <asm/ptrace.h>
#include <asm/string.h>
@@ -72,5 +92,12 @@
newregs->eip = (unsigned long)current_text_addr();
}
}
+asmlinkage NORET_TYPE void
+relocate_kernel(unsigned long indirection_page,
+ unsigned long control_page,
+ unsigned long start_address,
+ unsigned int has_pae) ATTRIB_NORET;
+
+#endif /* __ASSEMBLY__ */
#endif /* _I386_KEXEC_H */
diff --git a/include/asm-i386/mach-es7000/mach_apic.h b/include/asm-i386/mach-es7000/mach_apic.h
index b5f3f0d..2633368 100644
--- a/include/asm-i386/mach-es7000/mach_apic.h
+++ b/include/asm-i386/mach-es7000/mach_apic.h
@@ -123,9 +123,13 @@
/* Mapping from cpu number to logical apicid */
static inline int cpu_to_logical_apicid(int cpu)
{
+#ifdef CONFIG_SMP
if (cpu >= NR_CPUS)
return BAD_APICID;
return (int)cpu_2_logical_apicid[cpu];
+#else
+ return logical_smp_processor_id();
+#endif
}
static inline int mpc_apic_id(struct mpc_config_processor *m, struct mpc_config_translation *unused)
diff --git a/include/asm-i386/mach-summit/mach_apic.h b/include/asm-i386/mach-summit/mach_apic.h
index 9fd0732..a81b059 100644
--- a/include/asm-i386/mach-summit/mach_apic.h
+++ b/include/asm-i386/mach-summit/mach_apic.h
@@ -46,10 +46,12 @@
static inline void init_apic_ldr(void)
{
unsigned long val, id;
- int i, count;
- u8 lid;
+ int count = 0;
u8 my_id = (u8)hard_smp_processor_id();
u8 my_cluster = (u8)apicid_cluster(my_id);
+#ifdef CONFIG_SMP
+ u8 lid;
+ int i;
/* Create logical APIC IDs by counting CPUs already in cluster. */
for (count = 0, i = NR_CPUS; --i >= 0; ) {
@@ -57,6 +59,7 @@
if (lid != BAD_APICID && apicid_cluster(lid) == my_cluster)
++count;
}
+#endif
/* We only have a 4 wide bitmap in cluster mode. If a deranged
* BIOS puts 5 CPUs in one APIC cluster, we're hosed. */
BUG_ON(count >= XAPIC_DEST_CPUS_SHIFT);
@@ -91,9 +94,13 @@
/* Mapping from cpu number to logical apicid */
static inline int cpu_to_logical_apicid(int cpu)
{
+#ifdef CONFIG_SMP
if (cpu >= NR_CPUS)
return BAD_APICID;
return (int)cpu_2_logical_apicid[cpu];
+#else
+ return logical_smp_processor_id();
+#endif
}
static inline int cpu_present_to_apicid(int mps_cpu)
diff --git a/include/asm-i386/mutex.h b/include/asm-i386/mutex.h
index 05a5385..7a17d9e 100644
--- a/include/asm-i386/mutex.h
+++ b/include/asm-i386/mutex.h
@@ -30,14 +30,10 @@
\
__asm__ __volatile__( \
LOCK_PREFIX " decl (%%eax) \n" \
- " js 2f \n" \
+ " jns 1f \n" \
+ " call "#fail_fn" \n" \
"1: \n" \
\
- LOCK_SECTION_START("") \
- "2: call "#fail_fn" \n" \
- " jmp 1b \n" \
- LOCK_SECTION_END \
- \
:"=a" (dummy) \
: "a" (count) \
: "memory", "ecx", "edx"); \
@@ -86,14 +82,10 @@
\
__asm__ __volatile__( \
LOCK_PREFIX " incl (%%eax) \n" \
- " jle 2f \n" \
+ " jg 1f \n" \
+ " call "#fail_fn" \n" \
"1: \n" \
\
- LOCK_SECTION_START("") \
- "2: call "#fail_fn" \n" \
- " jmp 1b \n" \
- LOCK_SECTION_END \
- \
:"=a" (dummy) \
: "a" (count) \
: "memory", "ecx", "edx"); \
diff --git a/include/asm-i386/nmi.h b/include/asm-i386/nmi.h
index 67d9947..303bcd4 100644
--- a/include/asm-i386/nmi.h
+++ b/include/asm-i386/nmi.h
@@ -6,32 +6,29 @@
#include <linux/pm.h>
-struct pt_regs;
-
-typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
-
/**
- * set_nmi_callback
+ * do_nmi_callback
*
- * Set a handler for an NMI. Only one handler may be
- * set. Return 1 if the NMI was handled.
+ * Check to see if a callback exists and execute it. Return 1
+ * if the handler exists and was handled successfully.
*/
-void set_nmi_callback(nmi_callback_t callback);
+int do_nmi_callback(struct pt_regs *regs, int cpu);
-/**
- * unset_nmi_callback
- *
- * Remove the handler previously set.
- */
-void unset_nmi_callback(void);
+extern int nmi_watchdog_enabled;
+extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int avail_to_resrv_perfctr_nmi(unsigned int);
+extern int reserve_perfctr_nmi(unsigned int);
+extern void release_perfctr_nmi(unsigned int);
+extern int reserve_evntsel_nmi(unsigned int);
+extern void release_evntsel_nmi(unsigned int);
-extern void setup_apic_nmi_watchdog (void);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
+extern void setup_apic_nmi_watchdog (void *);
+extern void stop_apic_nmi_watchdog (void *);
extern void disable_timer_nmi_watchdog(void);
extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs);
+extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
+extern atomic_t nmi_active;
extern unsigned int nmi_watchdog;
#define NMI_DEFAULT -1
#define NMI_NONE 0
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index 0dc051a..541b3e2 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -411,8 +411,6 @@
static inline int set_kernel_exec(unsigned long vaddr, int enable) { return 0;}
#endif
-extern void noexec_setup(const char *str);
-
#if defined(CONFIG_HIGHPTE)
#define pte_offset_map(dir, address) \
((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0) + pte_index(address))
diff --git a/include/asm-i386/ptrace.h b/include/asm-i386/ptrace.h
index 1910880..a4a0e52 100644
--- a/include/asm-i386/ptrace.h
+++ b/include/asm-i386/ptrace.h
@@ -27,6 +27,7 @@
#ifdef __KERNEL__
#include <asm/vm86.h>
+#include <asm/segment.h>
struct task_struct;
extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs, int error_code);
@@ -40,18 +41,14 @@
*/
static inline int user_mode(struct pt_regs *regs)
{
- return (regs->xcs & 3) != 0;
+ return (regs->xcs & SEGMENT_RPL_MASK) == USER_RPL;
}
static inline int user_mode_vm(struct pt_regs *regs)
{
- return ((regs->xcs & 3) | (regs->eflags & VM_MASK)) != 0;
+ return ((regs->xcs & SEGMENT_RPL_MASK) | (regs->eflags & VM_MASK)) >= USER_RPL;
}
#define instruction_pointer(regs) ((regs)->eip)
-#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
extern unsigned long profile_pc(struct pt_regs *regs);
-#else
-#define profile_pc(regs) instruction_pointer(regs)
-#endif
#endif /* __KERNEL__ */
#endif
diff --git a/include/asm-i386/rwlock.h b/include/asm-i386/rwlock.h
index 87c069c..c3e5db3 100644
--- a/include/asm-i386/rwlock.h
+++ b/include/asm-i386/rwlock.h
@@ -20,52 +20,6 @@
#define RW_LOCK_BIAS 0x01000000
#define RW_LOCK_BIAS_STR "0x01000000"
-#define __build_read_lock_ptr(rw, helper) \
- asm volatile(LOCK_PREFIX " subl $1,(%0)\n\t" \
- "jns 1f\n" \
- "call " helper "\n\t" \
- "1:\n" \
- ::"a" (rw) : "memory")
-
-#define __build_read_lock_const(rw, helper) \
- asm volatile(LOCK_PREFIX " subl $1,%0\n\t" \
- "jns 1f\n" \
- "pushl %%eax\n\t" \
- "leal %0,%%eax\n\t" \
- "call " helper "\n\t" \
- "popl %%eax\n\t" \
- "1:\n" \
- :"+m" (*(volatile int *)rw) : : "memory")
-
-#define __build_read_lock(rw, helper) do { \
- if (__builtin_constant_p(rw)) \
- __build_read_lock_const(rw, helper); \
- else \
- __build_read_lock_ptr(rw, helper); \
- } while (0)
-
-#define __build_write_lock_ptr(rw, helper) \
- asm volatile(LOCK_PREFIX " subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
- "jz 1f\n" \
- "call " helper "\n\t" \
- "1:\n" \
- ::"a" (rw) : "memory")
-
-#define __build_write_lock_const(rw, helper) \
- asm volatile(LOCK_PREFIX " subl $" RW_LOCK_BIAS_STR ",%0\n\t" \
- "jz 1f\n" \
- "pushl %%eax\n\t" \
- "leal %0,%%eax\n\t" \
- "call " helper "\n\t" \
- "popl %%eax\n\t" \
- "1:\n" \
- :"+m" (*(volatile int *)rw) : : "memory")
-
-#define __build_write_lock(rw, helper) do { \
- if (__builtin_constant_p(rw)) \
- __build_write_lock_const(rw, helper); \
- else \
- __build_write_lock_ptr(rw, helper); \
- } while (0)
+/* Code is in asm-i386/spinlock.h */
#endif
diff --git a/include/asm-i386/rwsem.h b/include/asm-i386/rwsem.h
index 43113f5..bc598d6 100644
--- a/include/asm-i386/rwsem.h
+++ b/include/asm-i386/rwsem.h
@@ -99,17 +99,9 @@
__asm__ __volatile__(
"# beginning down_read\n\t"
LOCK_PREFIX " incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */
- " js 2f\n\t" /* jump if we weren't granted the lock */
+ " jns 1f\n"
+ " call call_rwsem_down_read_failed\n"
"1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " pushl %%edx\n\t"
- " call rwsem_down_read_failed\n\t"
- " popl %%edx\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
"# ending down_read\n\t"
: "+m" (sem->count)
: "a" (sem)
@@ -151,15 +143,9 @@
"# beginning down_write\n\t"
LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtract 0x0000ffff, returns the old value */
" testl %%edx,%%edx\n\t" /* was the count 0 before? */
- " jnz 2f\n\t" /* jump if we weren't granted the lock */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " call rwsem_down_write_failed\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
+ " jz 1f\n"
+ " call call_rwsem_down_write_failed\n"
+ "1:\n"
"# ending down_write"
: "+m" (sem->count), "=d" (tmp)
: "a" (sem), "1" (tmp)
@@ -193,17 +179,9 @@
__asm__ __volatile__(
"# beginning __up_read\n\t"
LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtracts 1, returns the old value */
- " js 2f\n\t" /* jump if the lock is being waited upon */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " decw %%dx\n\t" /* do nothing if still outstanding active readers */
- " jnz 1b\n\t"
- " pushl %%ecx\n\t"
- " call rwsem_wake\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
+ " jns 1f\n\t"
+ " call call_rwsem_wake\n"
+ "1:\n"
"# ending __up_read\n"
: "+m" (sem->count), "=d" (tmp)
: "a" (sem), "1" (tmp)
@@ -219,17 +197,9 @@
"# beginning __up_write\n\t"
" movl %2,%%edx\n\t"
LOCK_PREFIX " xaddl %%edx,(%%eax)\n\t" /* tries to transition 0xffff0001 -> 0x00000000 */
- " jnz 2f\n\t" /* jump if the lock is being waited upon */
+ " jz 1f\n"
+ " call call_rwsem_wake\n"
"1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " decw %%dx\n\t" /* did the active count reduce to 0? */
- " jnz 1b\n\t" /* jump back if not */
- " pushl %%ecx\n\t"
- " call rwsem_wake\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
"# ending __up_write\n"
: "+m" (sem->count)
: "a" (sem), "i" (-RWSEM_ACTIVE_WRITE_BIAS)
@@ -244,17 +214,9 @@
__asm__ __volatile__(
"# beginning __downgrade_write\n\t"
LOCK_PREFIX " addl %2,(%%eax)\n\t" /* transitions 0xZZZZ0001 -> 0xYYYY0001 */
- " js 2f\n\t" /* jump if the lock is being waited upon */
+ " jns 1f\n\t"
+ " call call_rwsem_downgrade_wake\n"
"1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " pushl %%edx\n\t"
- " call rwsem_downgrade_wake\n\t"
- " popl %%edx\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
"# ending __downgrade_write\n"
: "+m" (sem->count)
: "a" (sem), "i" (-RWSEM_WAITING_BIAS)
diff --git a/include/asm-i386/segment.h b/include/asm-i386/segment.h
index faf9953..b7ab596 100644
--- a/include/asm-i386/segment.h
+++ b/include/asm-i386/segment.h
@@ -83,6 +83,11 @@
#define GDT_SIZE (GDT_ENTRIES * 8)
+/* Matches __KERNEL_CS and __USER_CS (they must be 2 entries apart) */
+#define SEGMENT_IS_FLAT_CODE(x) (((x) & 0xec) == GDT_ENTRY_KERNEL_CS * 8)
+/* Matches PNP_CS32 and PNP_CS16 (they must be consecutive) */
+#define SEGMENT_IS_PNP_CODE(x) (((x) & 0xf4) == GDT_ENTRY_PNPBIOS_BASE * 8)
+
/* Simple and small GDT entries for booting only */
#define GDT_ENTRY_BOOT_CS 2
@@ -112,4 +117,16 @@
*/
#define IDT_ENTRIES 256
+/* Bottom two bits of selector give the ring privilege level */
+#define SEGMENT_RPL_MASK 0x3
+/* Bit 2 is table indicator (LDT/GDT) */
+#define SEGMENT_TI_MASK 0x4
+
+/* User mode is privilege level 3 */
+#define USER_RPL 0x3
+/* LDT segment has TI set, GDT has it cleared */
+#define SEGMENT_LDT 0x4
+#define SEGMENT_GDT 0x0
+
+#define get_kernel_rpl() 0
#endif
diff --git a/include/asm-i386/semaphore.h b/include/asm-i386/semaphore.h
index d51e800..e63b6a68 100644
--- a/include/asm-i386/semaphore.h
+++ b/include/asm-i386/semaphore.h
@@ -100,13 +100,10 @@
__asm__ __volatile__(
"# atomic down operation\n\t"
LOCK_PREFIX "decl %0\n\t" /* --sem->count */
- "js 2f\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tlea %0,%%eax\n\t"
- "call __down_failed\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jns 2f\n"
+ "\tlea %0,%%eax\n\t"
+ "call __down_failed\n"
+ "2:"
:"+m" (sem->count)
:
:"memory","ax");
@@ -123,15 +120,12 @@
might_sleep();
__asm__ __volatile__(
"# atomic interruptible down operation\n\t"
+ "xorl %0,%0\n\t"
LOCK_PREFIX "decl %1\n\t" /* --sem->count */
- "js 2f\n\t"
- "xorl %0,%0\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tlea %1,%%eax\n\t"
- "call __down_failed_interruptible\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jns 2f\n\t"
+ "lea %1,%%eax\n\t"
+ "call __down_failed_interruptible\n"
+ "2:"
:"=a" (result), "+m" (sem->count)
:
:"memory");
@@ -148,15 +142,12 @@
__asm__ __volatile__(
"# atomic interruptible down operation\n\t"
+ "xorl %0,%0\n\t"
LOCK_PREFIX "decl %1\n\t" /* --sem->count */
- "js 2f\n\t"
- "xorl %0,%0\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tlea %1,%%eax\n\t"
+ "jns 2f\n\t"
+ "lea %1,%%eax\n\t"
"call __down_failed_trylock\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "2:\n"
:"=a" (result), "+m" (sem->count)
:
:"memory");
@@ -166,22 +157,16 @@
/*
* Note! This is subtle. We jump to wake people up only if
* the semaphore was negative (== somebody was waiting on it).
- * The default case (no contention) will result in NO
- * jumps for both down() and up().
*/
static inline void up(struct semaphore * sem)
{
__asm__ __volatile__(
"# atomic up operation\n\t"
LOCK_PREFIX "incl %0\n\t" /* ++sem->count */
- "jle 2f\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tlea %0,%%eax\n\t"
- "call __up_wakeup\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
- ".subsection 0\n"
+ "jg 1f\n\t"
+ "lea %0,%%eax\n\t"
+ "call __up_wakeup\n"
+ "1:"
:"+m" (sem->count)
:
:"memory","ax");
diff --git a/include/asm-i386/smp.h b/include/asm-i386/smp.h
index 142d10e..32ac8c9 100644
--- a/include/asm-i386/smp.h
+++ b/include/asm-i386/smp.h
@@ -80,17 +80,12 @@
return GET_APIC_ID(*(unsigned long *)(APIC_BASE+APIC_ID));
}
#endif
-
-static __inline int logical_smp_processor_id(void)
-{
- /* we don't want to mark this access volatile - bad code generation */
- return GET_APIC_LOGICAL_ID(*(unsigned long *)(APIC_BASE+APIC_LDR));
-}
-
#endif
extern int __cpu_disable(void);
extern void __cpu_die(unsigned int cpu);
+extern unsigned int num_processors;
+
#endif /* !__ASSEMBLY__ */
#else /* CONFIG_SMP */
@@ -100,4 +95,15 @@
#define NO_PROC_ID 0xFF /* No processor magic marker */
#endif
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_X86_LOCAL_APIC
+static __inline int logical_smp_processor_id(void)
+{
+ /* we don't want to mark this access volatile - bad code generation */
+ return GET_APIC_LOGICAL_ID(*(unsigned long *)(APIC_BASE+APIC_LDR));
+}
+#endif
+#endif
+
#endif
diff --git a/include/asm-i386/spinlock.h b/include/asm-i386/spinlock.h
index d102036..b0b3043 100644
--- a/include/asm-i386/spinlock.h
+++ b/include/asm-i386/spinlock.h
@@ -4,8 +4,12 @@
#include <asm/atomic.h>
#include <asm/rwlock.h>
#include <asm/page.h>
+#include <asm/processor.h>
#include <linux/compiler.h>
+#define CLI_STRING "cli"
+#define STI_STRING "sti"
+
/*
* Your basic SMP spinlocks, allowing only a single CPU anywhere
*
@@ -17,67 +21,64 @@
* (the type definitions are in asm/spinlock_types.h)
*/
-#define __raw_spin_is_locked(x) \
- (*(volatile signed char *)(&(x)->slock) <= 0)
-
-#define __raw_spin_lock_string \
- "\n1:\t" \
- LOCK_PREFIX " ; decb %0\n\t" \
- "jns 3f\n" \
- "2:\t" \
- "rep;nop\n\t" \
- "cmpb $0,%0\n\t" \
- "jle 2b\n\t" \
- "jmp 1b\n" \
- "3:\n\t"
-
-/*
- * NOTE: there's an irqs-on section here, which normally would have to be
- * irq-traced, but on CONFIG_TRACE_IRQFLAGS we never use
- * __raw_spin_lock_string_flags().
- */
-#define __raw_spin_lock_string_flags \
- "\n1:\t" \
- LOCK_PREFIX " ; decb %0\n\t" \
- "jns 5f\n" \
- "2:\t" \
- "testl $0x200, %1\n\t" \
- "jz 4f\n\t" \
- "sti\n" \
- "3:\t" \
- "rep;nop\n\t" \
- "cmpb $0, %0\n\t" \
- "jle 3b\n\t" \
- "cli\n\t" \
- "jmp 1b\n" \
- "4:\t" \
- "rep;nop\n\t" \
- "cmpb $0, %0\n\t" \
- "jg 1b\n\t" \
- "jmp 4b\n" \
- "5:\n\t"
+static inline int __raw_spin_is_locked(raw_spinlock_t *x)
+{
+ return *(volatile signed char *)(&(x)->slock) <= 0;
+}
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
- asm(__raw_spin_lock_string : "+m" (lock->slock) : : "memory");
+ asm volatile("\n1:\t"
+ LOCK_PREFIX " ; decb %0\n\t"
+ "jns 3f\n"
+ "2:\t"
+ "rep;nop\n\t"
+ "cmpb $0,%0\n\t"
+ "jle 2b\n\t"
+ "jmp 1b\n"
+ "3:\n\t"
+ : "+m" (lock->slock) : : "memory");
}
/*
* It is easier for the lock validator if interrupts are not re-enabled
* in the middle of a lock-acquire. This is a performance feature anyway
* so we turn it off:
+ *
+ * NOTE: there's an irqs-on section here, which normally would have to be
+ * irq-traced, but on CONFIG_TRACE_IRQFLAGS we never use this variant.
*/
#ifndef CONFIG_PROVE_LOCKING
static inline void __raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long flags)
{
- asm(__raw_spin_lock_string_flags : "+m" (lock->slock) : "r" (flags) : "memory");
+ asm volatile(
+ "\n1:\t"
+ LOCK_PREFIX " ; decb %0\n\t"
+ "jns 5f\n"
+ "2:\t"
+ "testl $0x200, %1\n\t"
+ "jz 4f\n\t"
+ STI_STRING "\n"
+ "3:\t"
+ "rep;nop\n\t"
+ "cmpb $0, %0\n\t"
+ "jle 3b\n\t"
+ CLI_STRING "\n\t"
+ "jmp 1b\n"
+ "4:\t"
+ "rep;nop\n\t"
+ "cmpb $0, %0\n\t"
+ "jg 1b\n\t"
+ "jmp 4b\n"
+ "5:\n\t"
+ : "+m" (lock->slock) : "r" (flags) : "memory");
}
#endif
static inline int __raw_spin_trylock(raw_spinlock_t *lock)
{
char oldval;
- __asm__ __volatile__(
+ asm volatile(
"xchgb %b0,%1"
:"=q" (oldval), "+m" (lock->slock)
:"0" (0) : "memory");
@@ -93,38 +94,29 @@
#if !defined(CONFIG_X86_OOSTORE) && !defined(CONFIG_X86_PPRO_FENCE)
-#define __raw_spin_unlock_string \
- "movb $1,%0" \
- :"+m" (lock->slock) : : "memory"
-
-
static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
- __asm__ __volatile__(
- __raw_spin_unlock_string
- );
+ asm volatile("movb $1,%0" : "+m" (lock->slock) :: "memory");
}
#else
-#define __raw_spin_unlock_string \
- "xchgb %b0, %1" \
- :"=q" (oldval), "+m" (lock->slock) \
- :"0" (oldval) : "memory"
-
static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
char oldval = 1;
- __asm__ __volatile__(
- __raw_spin_unlock_string
- );
+ asm volatile("xchgb %b0, %1"
+ : "=q" (oldval), "+m" (lock->slock)
+ : "0" (oldval) : "memory");
}
#endif
-#define __raw_spin_unlock_wait(lock) \
- do { while (__raw_spin_is_locked(lock)) cpu_relax(); } while (0)
+static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
+{
+ while (__raw_spin_is_locked(lock))
+ cpu_relax();
+}
/*
* Read-write spinlocks, allowing multiple readers
@@ -151,22 +143,36 @@
* read_can_lock - would read_trylock() succeed?
* @lock: the rwlock in question.
*/
-#define __raw_read_can_lock(x) ((int)(x)->lock > 0)
+static inline int __raw_read_can_lock(raw_rwlock_t *x)
+{
+ return (int)(x)->lock > 0;
+}
/**
* write_can_lock - would write_trylock() succeed?
* @lock: the rwlock in question.
*/
-#define __raw_write_can_lock(x) ((x)->lock == RW_LOCK_BIAS)
+static inline int __raw_write_can_lock(raw_rwlock_t *x)
+{
+ return (x)->lock == RW_LOCK_BIAS;
+}
static inline void __raw_read_lock(raw_rwlock_t *rw)
{
- __build_read_lock(rw, "__read_lock_failed");
+ asm volatile(LOCK_PREFIX " subl $1,(%0)\n\t"
+ "jns 1f\n"
+ "call __read_lock_failed\n\t"
+ "1:\n"
+ ::"a" (rw) : "memory");
}
static inline void __raw_write_lock(raw_rwlock_t *rw)
{
- __build_write_lock(rw, "__write_lock_failed");
+ asm volatile(LOCK_PREFIX " subl $" RW_LOCK_BIAS_STR ",(%0)\n\t"
+ "jz 1f\n"
+ "call __write_lock_failed\n\t"
+ "1:\n"
+ ::"a" (rw) : "memory");
}
static inline int __raw_read_trylock(raw_rwlock_t *lock)
diff --git a/include/asm-i386/stacktrace.h b/include/asm-i386/stacktrace.h
new file mode 100644
index 0000000..7d1f6a5
--- /dev/null
+++ b/include/asm-i386/stacktrace.h
@@ -0,0 +1 @@
+#include <asm-x86_64/stacktrace.h>
diff --git a/include/asm-i386/therm_throt.h b/include/asm-i386/therm_throt.h
new file mode 100644
index 0000000..399bf60
--- /dev/null
+++ b/include/asm-i386/therm_throt.h
@@ -0,0 +1,9 @@
+#ifndef __ASM_I386_THERM_THROT_H__
+#define __ASM_I386_THERM_THROT_H__ 1
+
+#include <asm/atomic.h>
+
+extern atomic_t therm_throt_en;
+int therm_throt_process(int curr);
+
+#endif /* __ASM_I386_THERM_THROT_H__ */
diff --git a/include/asm-i386/tlbflush.h b/include/asm-i386/tlbflush.h
index d57ca5c..360648b 100644
--- a/include/asm-i386/tlbflush.h
+++ b/include/asm-i386/tlbflush.h
@@ -36,8 +36,6 @@
: "memory"); \
} while (0)
-extern unsigned long pgkern_mask;
-
# define __flush_tlb_all() \
do { \
if (cpu_has_pge) \
@@ -49,7 +47,7 @@
#define cpu_has_invlpg (boot_cpu_data.x86 > 3)
#define __flush_tlb_single(addr) \
- __asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
+ __asm__ __volatile__("invlpg (%0)" ::"r" (addr) : "memory")
#ifdef CONFIG_X86_INVLPG
# define __flush_tlb_one(addr) __flush_tlb_single(addr)
diff --git a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
index 97b828c..c139331 100644
--- a/include/asm-i386/tsc.h
+++ b/include/asm-i386/tsc.h
@@ -6,7 +6,6 @@
#ifndef _ASM_i386_TSC_H
#define _ASM_i386_TSC_H
-#include <linux/config.h>
#include <asm/processor.h>
/*
diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index fc1c8dd..565d089 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -323,10 +323,11 @@
#define __NR_tee 315
#define __NR_vmsplice 316
#define __NR_move_pages 317
+#define __NR_getcpu 318
#ifdef __KERNEL__
-#define NR_syscalls 318
+#define NR_syscalls 319
/*
* user-visible error numbers are in the range -1 - -128: see
diff --git a/include/asm-i386/unwind.h b/include/asm-i386/unwind.h
index 4c1a0b9..5031d69 100644
--- a/include/asm-i386/unwind.h
+++ b/include/asm-i386/unwind.h
@@ -18,6 +18,7 @@
{
struct pt_regs regs;
struct task_struct *task;
+ unsigned call_frame:1;
};
#define UNW_PC(frame) (frame)->regs.eip
@@ -28,6 +29,8 @@
#define FRAME_LINK_OFFSET 0
#define STACK_BOTTOM(tsk) STACK_LIMIT((tsk)->thread.esp0)
#define STACK_TOP(tsk) ((tsk)->thread.esp0)
+#else
+#define UNW_FP(frame) ((void)(frame), 0)
#endif
#define STACK_LIMIT(ptr) (((ptr) - 1) & ~(THREAD_SIZE - 1))
@@ -42,6 +45,10 @@
PTREGS_INFO(edi), \
PTREGS_INFO(eip)
+#define UNW_DEFAULT_RA(raItem, dataAlign) \
+ ((raItem).where == Memory && \
+ !((raItem).value * (dataAlign) + 4))
+
static inline void arch_unw_init_frame_info(struct unwind_frame_info *info,
/*const*/ struct pt_regs *regs)
{
@@ -88,6 +95,7 @@
#define UNW_PC(frame) ((void)(frame), 0)
#define UNW_SP(frame) ((void)(frame), 0)
+#define UNW_FP(frame) ((void)(frame), 0)
static inline int arch_unw_user_mode(const void *info)
{
diff --git a/include/asm-ia64/module.h b/include/asm-ia64/module.h
index 85c82bd..d2da61e 100644
--- a/include/asm-ia64/module.h
+++ b/include/asm-ia64/module.h
@@ -28,7 +28,8 @@
#define Elf_Ehdr Elf64_Ehdr
#define MODULE_PROC_FAMILY "ia64"
-#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
+#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY \
+ "gcc-" __stringify(__GNUC__) "." __stringify(__GNUC_MINOR__)
#define ARCH_SHF_SMALL SHF_IA_64_SHORT
diff --git a/include/asm-um/alternative-asm.i b/include/asm-um/alternative-asm.i
new file mode 100644
index 0000000..cae9fac
--- /dev/null
+++ b/include/asm-um/alternative-asm.i
@@ -0,0 +1,6 @@
+#ifndef __UM_ALTERNATIVE_ASM_I
+#define __UM_ALTERNATIVE_ASM_I
+
+#include "asm/arch/alternative-asm.i"
+
+#endif
diff --git a/include/asm-um/frame.i b/include/asm-um/frame.i
new file mode 100644
index 0000000..09d5dca
--- /dev/null
+++ b/include/asm-um/frame.i
@@ -0,0 +1,6 @@
+#ifndef __UM_FRAME_I
+#define __UM_FRAME_I
+
+#include "asm/arch/frame.i"
+
+#endif
diff --git a/include/asm-x86_64/acpi.h b/include/asm-x86_64/acpi.h
index 2c95a31..ed59aa4 100644
--- a/include/asm-x86_64/acpi.h
+++ b/include/asm-x86_64/acpi.h
@@ -155,8 +155,6 @@
#endif /*CONFIG_ACPI_SLEEP*/
-#define boot_cpu_physical_apicid boot_cpu_id
-
extern int acpi_disabled;
extern int acpi_pci_disabled;
diff --git a/include/asm-x86_64/alternative-asm.i b/include/asm-x86_64/alternative-asm.i
new file mode 100644
index 0000000..e4041f4
--- /dev/null
+++ b/include/asm-x86_64/alternative-asm.i
@@ -0,0 +1,14 @@
+#include <linux/config.h>
+
+#ifdef CONFIG_SMP
+ .macro LOCK_PREFIX
+1: lock
+ .section .smp_locks,"a"
+ .align 8
+ .quad 1b
+ .previous
+ .endm
+#else
+ .macro LOCK_PREFIX
+ .endm
+#endif
diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h
index 9c96a0a..9e66d32 100644
--- a/include/asm-x86_64/apic.h
+++ b/include/asm-x86_64/apic.h
@@ -17,6 +17,8 @@
extern int apic_verbosity;
extern int apic_runs_main_timer;
+extern int ioapic_force;
+extern int apic_mapped;
/*
* Define the default level of output to be very little
@@ -29,8 +31,6 @@
printk(s, ##a); \
} while (0)
-#ifdef CONFIG_X86_LOCAL_APIC
-
struct pt_regs;
/*
@@ -95,17 +95,12 @@
#define K8_APIC_EXT_INT_MSG_EXT 0x7
#define K8_APIC_EXT_LVT_ENTRY_THRESHOLD 0
-extern int disable_timer_pin_1;
-
-
void smp_send_timer_broadcast_ipi(void);
void switch_APIC_timer_to_ipi(void *cpumask);
void switch_ipi_to_APIC_timer(void *cpumask);
#define ARCH_APICTIMER_STOPS_ON_C3 1
-#endif /* CONFIG_X86_LOCAL_APIC */
-
extern unsigned boot_cpu_id;
#endif /* __ASM_APIC_H */
diff --git a/include/asm-x86_64/bitops.h b/include/asm-x86_64/bitops.h
index f7ba57b..5b535ea 100644
--- a/include/asm-x86_64/bitops.h
+++ b/include/asm-x86_64/bitops.h
@@ -399,6 +399,8 @@
return r+1;
}
+#define ARCH_HAS_FAST_MULTIPLIER 1
+
#include <asm-generic/bitops/hweight.h>
#endif /* __KERNEL__ */
diff --git a/include/asm-x86_64/calgary.h b/include/asm-x86_64/calgary.h
index 4e39195..6b93f5a 100644
--- a/include/asm-x86_64/calgary.h
+++ b/include/asm-x86_64/calgary.h
@@ -24,7 +24,6 @@
#ifndef _ASM_X86_64_CALGARY_H
#define _ASM_X86_64_CALGARY_H
-#include <linux/config.h>
#include <linux/spinlock.h>
#include <linux/device.h>
#include <linux/dma-mapping.h>
@@ -34,12 +33,12 @@
unsigned long it_base; /* mapped address of tce table */
unsigned long it_hint; /* Hint for next alloc */
unsigned long *it_map; /* A simple allocation bitmap for now */
+ void __iomem *bbar; /* Bridge BAR */
+ u64 tar_val; /* Table Address Register */
+ struct timer_list watchdog_timer;
spinlock_t it_lock; /* Protects it_map */
unsigned int it_size; /* Size of iommu table in entries */
unsigned char it_busno; /* Bus number this table belongs to */
- void __iomem *bbar;
- u64 tar_val;
- struct timer_list watchdog_timer;
};
#define TCE_TABLE_SIZE_UNSPECIFIED ~0
diff --git a/include/asm-x86_64/dwarf2.h b/include/asm-x86_64/dwarf2.h
index 0744db7..eedc085 100644
--- a/include/asm-x86_64/dwarf2.h
+++ b/include/asm-x86_64/dwarf2.h
@@ -13,7 +13,7 @@
away for older version.
*/
-#ifdef CONFIG_UNWIND_INFO
+#ifdef CONFIG_AS_CFI
#define CFI_STARTPROC .cfi_startproc
#define CFI_ENDPROC .cfi_endproc
@@ -28,6 +28,11 @@
#define CFI_REMEMBER_STATE .cfi_remember_state
#define CFI_RESTORE_STATE .cfi_restore_state
#define CFI_UNDEFINED .cfi_undefined
+#ifdef CONFIG_AS_CFI_SIGNAL_FRAME
+#define CFI_SIGNAL_FRAME .cfi_signal_frame
+#else
+#define CFI_SIGNAL_FRAME
+#endif
#else
@@ -45,6 +50,7 @@
#define CFI_REMEMBER_STATE #
#define CFI_RESTORE_STATE #
#define CFI_UNDEFINED #
+#define CFI_SIGNAL_FRAME #
#endif
diff --git a/include/asm-x86_64/e820.h b/include/asm-x86_64/e820.h
index f656748..e15d3c8 100644
--- a/include/asm-x86_64/e820.h
+++ b/include/asm-x86_64/e820.h
@@ -19,13 +19,9 @@
#define E820_RAM 1
#define E820_RESERVED 2
-#define E820_ACPI 3 /* usable as RAM once ACPI tables have been read */
+#define E820_ACPI 3
#define E820_NVS 4
-#define HIGH_MEMORY (1024*1024)
-
-#define LOWMEMSIZE() (0x9f000)
-
#ifndef __ASSEMBLY__
struct e820entry {
u64 addr; /* start of memory segment */
@@ -56,8 +52,7 @@
extern unsigned long e820_hole_size(unsigned long start_pfn,
unsigned long end_pfn);
-extern void __init parse_memopt(char *p, char **end);
-extern void __init parse_memmapopt(char *p, char **end);
+extern void finish_e820_parsing(void);
extern struct e820map e820;
diff --git a/include/asm-x86_64/fixmap.h b/include/asm-x86_64/fixmap.h
index 0b4ffbd..1b620db 100644
--- a/include/asm-x86_64/fixmap.h
+++ b/include/asm-x86_64/fixmap.h
@@ -37,13 +37,9 @@
VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
VSYSCALL_HPET,
FIX_HPET_BASE,
-#ifdef CONFIG_X86_LOCAL_APIC
FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
-#endif
-#ifdef CONFIG_X86_IO_APIC
FIX_IO_APIC_BASE_0,
FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1,
-#endif
__end_of_fixed_addresses
};
diff --git a/include/asm-x86_64/genapic.h b/include/asm-x86_64/genapic.h
index 50b38e7..81e7146 100644
--- a/include/asm-x86_64/genapic.h
+++ b/include/asm-x86_64/genapic.h
@@ -16,7 +16,6 @@
char *name;
u32 int_delivery_mode;
u32 int_dest_mode;
- u32 int_delivery_dest; /* for quick IPIs */
int (*apic_id_registered)(void);
cpumask_t (*target_cpus)(void);
void (*init_apic_ldr)(void);
diff --git a/include/asm-x86_64/i387.h b/include/asm-x86_64/i387.h
index cba8a3b..0217b74 100644
--- a/include/asm-x86_64/i387.h
+++ b/include/asm-x86_64/i387.h
@@ -24,6 +24,7 @@
extern void mxcsr_feature_mask_init(void);
extern void init_fpu(struct task_struct *child);
extern int save_i387(struct _fpstate __user *buf);
+extern asmlinkage void math_state_restore(void);
/*
* FPU lazy state save handling...
@@ -31,7 +32,9 @@
#define unlazy_fpu(tsk) do { \
if (task_thread_info(tsk)->status & TS_USEDFPU) \
- save_init_fpu(tsk); \
+ save_init_fpu(tsk); \
+ else \
+ tsk->fpu_counter = 0; \
} while (0)
/* Ignore delayed exceptions from user space */
@@ -134,8 +137,8 @@
#else
: [fx] "cdaSDb" (fx), "0" (0));
#endif
- if (unlikely(err))
- __clear_user(fx, sizeof(struct i387_fxsave_struct));
+ if (unlikely(err) && __clear_user(fx, sizeof(struct i387_fxsave_struct)))
+ err = -EFAULT;
/* No need to clear here because the caller clears USED_MATH */
return err;
}
diff --git a/include/asm-x86_64/intel_arch_perfmon.h b/include/asm-x86_64/intel_arch_perfmon.h
index 59c3964..8633331 100644
--- a/include/asm-x86_64/intel_arch_perfmon.h
+++ b/include/asm-x86_64/intel_arch_perfmon.h
@@ -14,6 +14,18 @@
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL (0x3c)
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT (1 << 0)
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX (0)
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
+ (1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+};
#endif /* X86_64_INTEL_ARCH_PERFMON_H */
diff --git a/include/asm-x86_64/io_apic.h b/include/asm-x86_64/io_apic.h
index fb7a090..5d1b5c6 100644
--- a/include/asm-x86_64/io_apic.h
+++ b/include/asm-x86_64/io_apic.h
@@ -10,8 +10,6 @@
* Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
*/
-#ifdef CONFIG_X86_IO_APIC
-
#ifdef CONFIG_PCI_MSI
static inline int use_pci_vector(void) {return 1;}
static inline void disable_edge_ioapic_vector(unsigned int vector) { }
@@ -209,10 +207,6 @@
extern int sis_apic_bug; /* dummy */
-#else /* !CONFIG_X86_IO_APIC */
-#define io_apic_assign_pci_irqs 0
-#endif
-
extern int assign_irq_vector(int irq);
void enable_NMI_through_LVT0 (void * dummy);
diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h
index 9db5a1b..43469d8a 100644
--- a/include/asm-x86_64/irq.h
+++ b/include/asm-x86_64/irq.h
@@ -44,9 +44,7 @@
return ((irq == 2) ? 9 : irq);
}
-#ifdef CONFIG_X86_LOCAL_APIC
#define ARCH_HAS_NMI_WATCHDOG /* See include/linux/nmi.h */
-#endif
#ifdef CONFIG_HOTPLUG_CPU
#include <linux/cpumask.h>
diff --git a/include/asm-x86_64/kexec.h b/include/asm-x86_64/kexec.h
index c564bae..5fab957 100644
--- a/include/asm-x86_64/kexec.h
+++ b/include/asm-x86_64/kexec.h
@@ -1,6 +1,27 @@
#ifndef _X86_64_KEXEC_H
#define _X86_64_KEXEC_H
+#define PA_CONTROL_PAGE 0
+#define VA_CONTROL_PAGE 1
+#define PA_PGD 2
+#define VA_PGD 3
+#define PA_PUD_0 4
+#define VA_PUD_0 5
+#define PA_PMD_0 6
+#define VA_PMD_0 7
+#define PA_PTE_0 8
+#define VA_PTE_0 9
+#define PA_PUD_1 10
+#define VA_PUD_1 11
+#define PA_PMD_1 12
+#define VA_PMD_1 13
+#define PA_PTE_1 14
+#define VA_PTE_1 15
+#define PA_TABLE_PAGE 16
+#define PAGES_NR 17
+
+#ifndef __ASSEMBLY__
+
#include <linux/string.h>
#include <asm/page.h>
@@ -64,4 +85,12 @@
newregs->rip = (unsigned long)current_text_addr();
}
}
+
+NORET_TYPE void
+relocate_kernel(unsigned long indirection_page,
+ unsigned long page_list,
+ unsigned long start_address) ATTRIB_NORET;
+
+#endif /* __ASSEMBLY__ */
+
#endif /* _X86_64_KEXEC_H */
diff --git a/include/asm-x86_64/linkage.h b/include/asm-x86_64/linkage.h
index 291c2d0..b5f39d0 100644
--- a/include/asm-x86_64/linkage.h
+++ b/include/asm-x86_64/linkage.h
@@ -1,6 +1,6 @@
#ifndef __ASM_LINKAGE_H
#define __ASM_LINKAGE_H
-/* Nothing to see here... */
+#define __ALIGN .p2align 4,,15
#endif
diff --git a/include/asm-x86_64/mach_apic.h b/include/asm-x86_64/mach_apic.h
index 0acea44..d334224 100644
--- a/include/asm-x86_64/mach_apic.h
+++ b/include/asm-x86_64/mach_apic.h
@@ -16,7 +16,6 @@
#define INT_DELIVERY_MODE (genapic->int_delivery_mode)
#define INT_DEST_MODE (genapic->int_dest_mode)
-#define INT_DELIVERY_DEST (genapic->int_delivery_dest)
#define TARGET_CPUS (genapic->target_cpus())
#define apic_id_registered (genapic->apic_id_registered)
#define init_apic_ldr (genapic->init_apic_ldr)
diff --git a/include/asm-x86_64/mce.h b/include/asm-x86_64/mce.h
index d13687d..5a11146 100644
--- a/include/asm-x86_64/mce.h
+++ b/include/asm-x86_64/mce.h
@@ -99,6 +99,8 @@
}
#endif
+void mce_log_therm_throt_event(unsigned int cpu, __u64 status);
+
extern atomic_t mce_entry;
#endif
diff --git a/include/asm-x86_64/mmx.h b/include/asm-x86_64/mmx.h
deleted file mode 100644
index 46b71da..0000000
--- a/include/asm-x86_64/mmx.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef _ASM_MMX_H
-#define _ASM_MMX_H
-
-/*
- * MMX 3Dnow! helper operations
- */
-
-#include <linux/types.h>
-
-extern void *_mmx_memcpy(void *to, const void *from, size_t size);
-extern void mmx_clear_page(void *page);
-extern void mmx_copy_page(void *to, void *from);
-
-#endif
diff --git a/include/asm-x86_64/mpspec.h b/include/asm-x86_64/mpspec.h
index 14fc3dd..017fddb 100644
--- a/include/asm-x86_64/mpspec.h
+++ b/include/asm-x86_64/mpspec.h
@@ -159,13 +159,7 @@
#define MAX_MP_BUSSES 256
/* Each PCI slot may be a combo card with its own bus. 4 IRQ pins per slot. */
#define MAX_IRQ_SOURCES (MAX_MP_BUSSES * 4)
-enum mp_bustype {
- MP_BUS_ISA = 1,
- MP_BUS_EISA,
- MP_BUS_PCI,
- MP_BUS_MCA
-};
-extern unsigned char mp_bus_id_to_type [MAX_MP_BUSSES];
+extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
extern int mp_bus_id_to_pci_bus [MAX_MP_BUSSES];
extern unsigned int boot_cpu_physical_apicid;
@@ -178,18 +172,15 @@
extern struct mpc_config_intsrc mp_irqs [MAX_IRQ_SOURCES];
extern int mpc_default_type;
extern unsigned long mp_lapic_addr;
-extern int pic_mode;
#ifdef CONFIG_ACPI
extern void mp_register_lapic (u8 id, u8 enabled);
extern void mp_register_lapic_address (u64 address);
-#ifdef CONFIG_X86_IO_APIC
extern void mp_register_ioapic (u8 id, u32 address, u32 gsi_base);
extern void mp_override_legacy_irq (u8 bus_irq, u8 polarity, u8 trigger, u32 gsi);
extern void mp_config_acpi_legacy_irqs (void);
extern int mp_register_gsi (u32 gsi, int triggering, int polarity);
-#endif /*CONFIG_X86_IO_APIC*/
#endif
extern int using_apic_timer;
diff --git a/include/asm-x86_64/msr.h b/include/asm-x86_64/msr.h
index 10f8b51..37e1941 100644
--- a/include/asm-x86_64/msr.h
+++ b/include/asm-x86_64/msr.h
@@ -66,14 +66,25 @@
#define rdtscl(low) \
__asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx")
+#define rdtscp(low,high,aux) \
+ asm volatile (".byte 0x0f,0x01,0xf9" : "=a" (low), "=d" (high), "=c" (aux))
+
#define rdtscll(val) do { \
unsigned int __a,__d; \
asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
(val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
} while(0)
+#define rdtscpll(val, aux) do { \
+ unsigned long __a, __d; \
+ asm volatile (".byte 0x0f,0x01,0xf9" : "=a" (__a), "=d" (__d), "=c" (aux)); \
+ (val) = (__d << 32) | __a; \
+} while (0)
+
#define write_tsc(val1,val2) wrmsr(0x10, val1, val2)
+#define write_rdtscp_aux(val) wrmsr(0xc0000103, val, 0)
+
#define rdpmc(counter,low,high) \
__asm__ __volatile__("rdpmc" \
: "=a" (low), "=d" (high) \
diff --git a/include/asm-x86_64/mutex.h b/include/asm-x86_64/mutex.h
index 06fab6d..16396b1 100644
--- a/include/asm-x86_64/mutex.h
+++ b/include/asm-x86_64/mutex.h
@@ -25,13 +25,9 @@
\
__asm__ __volatile__( \
LOCK_PREFIX " decl (%%rdi) \n" \
- " js 2f \n" \
- "1: \n" \
- \
- LOCK_SECTION_START("") \
- "2: call "#fail_fn" \n" \
- " jmp 1b \n" \
- LOCK_SECTION_END \
+ " jns 1f \n" \
+ " call "#fail_fn" \n" \
+ "1:" \
\
:"=D" (dummy) \
: "D" (v) \
@@ -75,13 +71,9 @@
\
__asm__ __volatile__( \
LOCK_PREFIX " incl (%%rdi) \n" \
- " jle 2f \n" \
- "1: \n" \
- \
- LOCK_SECTION_START("") \
- "2: call "#fail_fn" \n" \
- " jmp 1b \n" \
- LOCK_SECTION_END \
+ " jg 1f \n" \
+ " call "#fail_fn" \n" \
+ "1: " \
\
:"=D" (dummy) \
: "D" (v) \
diff --git a/include/asm-x86_64/nmi.h b/include/asm-x86_64/nmi.h
index efb45c8..cbf2669 100644
--- a/include/asm-x86_64/nmi.h
+++ b/include/asm-x86_64/nmi.h
@@ -7,24 +7,13 @@
#include <linux/pm.h>
#include <asm/io.h>
-struct pt_regs;
-
-typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
-
/**
- * set_nmi_callback
+ * do_nmi_callback
*
- * Set a handler for an NMI. Only one handler may be
- * set. Return 1 if the NMI was handled.
+ * Check to see if a callback exists and execute it. Return 1
+ * if the handler exists and was handled successfully.
*/
-void set_nmi_callback(nmi_callback_t callback);
-
-/**
- * unset_nmi_callback
- *
- * Remove the handler previously set.
- */
-void unset_nmi_callback(void);
+int do_nmi_callback(struct pt_regs *regs, int cpu);
#ifdef CONFIG_PM
@@ -48,25 +37,32 @@
#endif /* CONFIG_PM */
extern void default_do_nmi(struct pt_regs *);
-extern void die_nmi(char *str, struct pt_regs *regs);
+extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
#define get_nmi_reason() inb(0x61)
extern int panic_on_timeout;
extern int unknown_nmi_panic;
+extern int nmi_watchdog_enabled;
extern int check_nmi_watchdog(void);
-
-extern void setup_apic_nmi_watchdog (void);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
+extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int avail_to_resrv_perfctr_nmi(unsigned int);
+extern int reserve_perfctr_nmi(unsigned int);
+extern void release_perfctr_nmi(unsigned int);
+extern int reserve_evntsel_nmi(unsigned int);
+extern void release_evntsel_nmi(unsigned int);
+
+extern void setup_apic_nmi_watchdog (void *);
+extern void stop_apic_nmi_watchdog (void *);
extern void disable_timer_nmi_watchdog(void);
extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
+extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
extern void nmi_watchdog_default(void);
extern int setup_nmi_watchdog(char *);
+extern atomic_t nmi_active;
extern unsigned int nmi_watchdog;
#define NMI_DEFAULT -1
#define NMI_NONE 0
diff --git a/include/asm-x86_64/pci-direct.h b/include/asm-x86_64/pci-direct.h
index 036b6ca..eba9cb4 100644
--- a/include/asm-x86_64/pci-direct.h
+++ b/include/asm-x86_64/pci-direct.h
@@ -2,47 +2,15 @@
#define ASM_PCI_DIRECT_H 1
#include <linux/types.h>
-#include <asm/io.h>
/* Direct PCI access. This is used for PCI accesses in early boot before
the PCI subsystem works. */
-#define PDprintk(x...)
+extern u32 read_pci_config(u8 bus, u8 slot, u8 func, u8 offset);
+extern u8 read_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset);
+extern u16 read_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset);
+extern void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset, u32 val);
-static inline u32 read_pci_config(u8 bus, u8 slot, u8 func, u8 offset)
-{
- u32 v;
- outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
- v = inl(0xcfc);
- if (v != 0xffffffff)
- PDprintk("%x reading 4 from %x: %x\n", slot, offset, v);
- return v;
-}
-
-static inline u8 read_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset)
-{
- u8 v;
- outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
- v = inb(0xcfc + (offset&3));
- PDprintk("%x reading 1 from %x: %x\n", slot, offset, v);
- return v;
-}
-
-static inline u16 read_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset)
-{
- u16 v;
- outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
- v = inw(0xcfc + (offset&2));
- PDprintk("%x reading 2 from %x: %x\n", slot, offset, v);
- return v;
-}
-
-static inline void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset,
- u32 val)
-{
- PDprintk("%x writing to %x: %x\n", slot, offset, val);
- outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
- outl(val, 0xcfc);
-}
+extern int early_pci_allowed(void);
#endif
diff --git a/include/asm-x86_64/pda.h b/include/asm-x86_64/pda.h
index b47c3df..14996d9 100644
--- a/include/asm-x86_64/pda.h
+++ b/include/asm-x86_64/pda.h
@@ -9,20 +9,24 @@
/* Per processor datastructure. %gs points to it while the kernel runs */
struct x8664_pda {
- struct task_struct *pcurrent; /* Current process */
- unsigned long data_offset; /* Per cpu data offset from linker address */
- unsigned long kernelstack; /* top of kernel stack for current */
- unsigned long oldrsp; /* user rsp for system call */
-#if DEBUG_STKSZ > EXCEPTION_STKSZ
- unsigned long debugstack; /* #DB/#BP stack. */
+ struct task_struct *pcurrent; /* 0 Current process */
+ unsigned long data_offset; /* 8 Per cpu data offset from linker
+ address */
+ unsigned long kernelstack; /* 16 top of kernel stack for current */
+ unsigned long oldrsp; /* 24 user rsp for system call */
+ int irqcount; /* 32 Irq nesting counter. Starts with -1 */
+ int cpunumber; /* 36 Logical CPU number */
+#ifdef CONFIG_CC_STACKPROTECTOR
+ unsigned long stack_canary; /* 40 stack canary value */
+ /* gcc-ABI: this canary MUST be at
+ offset 40!!! */
#endif
- int irqcount; /* Irq nesting counter. Starts with -1 */
- int cpunumber; /* Logical CPU number */
- char *irqstackptr; /* top of irqstack */
+ char *irqstackptr;
int nodenumber; /* number of current node */
unsigned int __softirq_pending;
unsigned int __nmi_count; /* number of NMI on this CPUs */
- int mmu_state;
+ short mmu_state;
+ short isidle;
struct mm_struct *active_mm;
unsigned apic_timer_irqs;
} ____cacheline_aligned_in_smp;
@@ -36,44 +40,69 @@
* There is no fast way to get the base address of the PDA, all the accesses
* have to mention %fs/%gs. So it needs to be done this Torvaldian way.
*/
-#define sizeof_field(type,field) (sizeof(((type *)0)->field))
-#define typeof_field(type,field) typeof(((type *)0)->field)
+extern void __bad_pda_field(void) __attribute__((noreturn));
-extern void __bad_pda_field(void);
+/*
+ * proxy_pda doesn't actually exist, but tell gcc it is accessed for
+ * all PDA accesses so it gets read/write dependencies right.
+ */
+extern struct x8664_pda _proxy_pda;
#define pda_offset(field) offsetof(struct x8664_pda, field)
-#define pda_to_op(op,field,val) do { \
- typedef typeof_field(struct x8664_pda, field) T__; \
- switch (sizeof_field(struct x8664_pda, field)) { \
-case 2: \
-asm volatile(op "w %0,%%gs:%P1"::"ri" ((T__)val),"i"(pda_offset(field)):"memory"); break; \
-case 4: \
-asm volatile(op "l %0,%%gs:%P1"::"ri" ((T__)val),"i"(pda_offset(field)):"memory"); break; \
-case 8: \
-asm volatile(op "q %0,%%gs:%P1"::"ri" ((T__)val),"i"(pda_offset(field)):"memory"); break; \
- default: __bad_pda_field(); \
- } \
+#define pda_to_op(op,field,val) do { \
+ typedef typeof(_proxy_pda.field) T__; \
+ if (0) { T__ tmp__; tmp__ = (val); } /* type checking */ \
+ switch (sizeof(_proxy_pda.field)) { \
+ case 2: \
+ asm(op "w %1,%%gs:%c2" : \
+ "+m" (_proxy_pda.field) : \
+ "ri" ((T__)val), \
+ "i"(pda_offset(field))); \
+ break; \
+ case 4: \
+ asm(op "l %1,%%gs:%c2" : \
+ "+m" (_proxy_pda.field) : \
+ "ri" ((T__)val), \
+ "i" (pda_offset(field))); \
+ break; \
+ case 8: \
+ asm(op "q %1,%%gs:%c2": \
+ "+m" (_proxy_pda.field) : \
+ "ri" ((T__)val), \
+ "i"(pda_offset(field))); \
+ break; \
+ default: \
+ __bad_pda_field(); \
+ } \
} while (0)
-/*
- * AK: PDA read accesses should be neither volatile nor have an memory clobber.
- * Unfortunately removing them causes all hell to break lose currently.
- */
-#define pda_from_op(op,field) ({ \
- typeof_field(struct x8664_pda, field) ret__; \
- switch (sizeof_field(struct x8664_pda, field)) { \
-case 2: \
-asm volatile(op "w %%gs:%P1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory"); break;\
-case 4: \
-asm volatile(op "l %%gs:%P1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory"); break;\
-case 8: \
-asm volatile(op "q %%gs:%P1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory"); break;\
- default: __bad_pda_field(); \
- } \
+#define pda_from_op(op,field) ({ \
+ typeof(_proxy_pda.field) ret__; \
+ switch (sizeof(_proxy_pda.field)) { \
+ case 2: \
+ asm(op "w %%gs:%c1,%0" : \
+ "=r" (ret__) : \
+ "i" (pda_offset(field)), \
+ "m" (_proxy_pda.field)); \
+ break; \
+ case 4: \
+ asm(op "l %%gs:%c1,%0": \
+ "=r" (ret__): \
+ "i" (pda_offset(field)), \
+ "m" (_proxy_pda.field)); \
+ break; \
+ case 8: \
+ asm(op "q %%gs:%c1,%0": \
+ "=r" (ret__) : \
+ "i" (pda_offset(field)), \
+ "m" (_proxy_pda.field)); \
+ break; \
+ default: \
+ __bad_pda_field(); \
+ } \
ret__; })
-
#define read_pda(field) pda_from_op("mov",field)
#define write_pda(field,val) pda_to_op("mov",field,val)
#define add_pda(field,val) pda_to_op("add",field,val)
diff --git a/include/asm-x86_64/percpu.h b/include/asm-x86_64/percpu.h
index bffb2f8..2857560 100644
--- a/include/asm-x86_64/percpu.h
+++ b/include/asm-x86_64/percpu.h
@@ -11,6 +11,16 @@
#include <asm/pda.h>
+#ifdef CONFIG_MODULES
+# define PERCPU_MODULE_RESERVE 8192
+#else
+# define PERCPU_MODULE_RESERVE 0
+#endif
+
+#define PERCPU_ENOUGH_ROOM \
+ (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
+ PERCPU_MODULE_RESERVE)
+
#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
#define __my_cpu_offset() read_pda(data_offset)
diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
index 51eba23..6899e77 100644
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -21,12 +21,9 @@
#define swapper_pg_dir init_level4_pgt
-extern int nonx_setup(char *str);
extern void paging_init(void);
extern void clear_kernel_mapping(unsigned long addr, unsigned long size);
-extern unsigned long pgkern_mask;
-
/*
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
@@ -265,7 +262,7 @@
#define __LARGE_PTE (_PAGE_PSE|_PAGE_PRESENT)
static inline int pte_user(pte_t pte) { return pte_val(pte) & _PAGE_USER; }
static inline int pte_read(pte_t pte) { return pte_val(pte) & _PAGE_USER; }
-static inline int pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_USER; }
+static inline int pte_exec(pte_t pte) { return !(pte_val(pte) & _PAGE_NX); }
static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; }
static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; }
static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; }
@@ -278,11 +275,12 @@
static inline pte_t pte_mkold(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_ACCESSED)); return pte; }
static inline pte_t pte_wrprotect(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_RW)); return pte; }
static inline pte_t pte_mkread(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_USER)); return pte; }
-static inline pte_t pte_mkexec(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_USER)); return pte; }
+static inline pte_t pte_mkexec(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_NX)); return pte; }
static inline pte_t pte_mkdirty(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; }
static inline pte_t pte_mkyoung(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; }
static inline pte_t pte_mkwrite(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_RW)); return pte; }
static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_PSE)); return pte; }
+static inline pte_t pte_clrhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_PSE)); return pte; }
struct vm_area_struct;
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 038fe1f4..b73d0c7 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -51,10 +51,8 @@
extern int sysctl_vsyscall;
extern int nohpet;
extern unsigned long vxtime_hz;
+extern void time_init_gtod(void);
-extern int numa_setup(char *opt);
-
-extern int setup_early_printk(char *);
extern void early_printk(const char *fmt, ...) __attribute__((format(printf,1,2)));
extern void early_identify_cpu(struct cpuinfo_x86 *c);
@@ -91,7 +89,7 @@
extern void setup_node_bootmem(int nodeid, unsigned long start, unsigned long end);
-extern void check_ioapic(void);
+extern void early_quirks(void);
extern void check_efer(void);
extern int unhandled_signal(struct task_struct *tsk, int sig);
@@ -103,13 +101,7 @@
extern unsigned long table_start, table_end;
extern int exception_trace;
-extern int using_apic_timer;
-extern int disable_apic;
extern unsigned cpu_khz;
-extern int ioapic_force;
-extern int skip_ioapic_setup;
-extern int acpi_ht;
-extern int acpi_disabled;
extern void no_iommu_init(void);
extern int force_iommu, no_iommu;
@@ -131,7 +123,8 @@
extern int reboot_force;
extern int notsc_setup(char *);
-extern int setup_additional_cpus(char *);
+
+extern int gsi_irq_sharing(int gsi);
extern void smp_local_timer_interrupt(struct pt_regs * regs);
diff --git a/include/asm-x86_64/rwlock.h b/include/asm-x86_64/rwlock.h
index dea0e94..72aeebe 100644
--- a/include/asm-x86_64/rwlock.h
+++ b/include/asm-x86_64/rwlock.h
@@ -18,69 +18,9 @@
#ifndef _ASM_X86_64_RWLOCK_H
#define _ASM_X86_64_RWLOCK_H
-#include <linux/stringify.h>
-
#define RW_LOCK_BIAS 0x01000000
-#define RW_LOCK_BIAS_STR "0x01000000"
+#define RW_LOCK_BIAS_STR "0x01000000"
-#define __build_read_lock_ptr(rw, helper) \
- asm volatile(LOCK_PREFIX "subl $1,(%0)\n\t" \
- "js 2f\n" \
- "1:\n" \
- LOCK_SECTION_START("") \
- "2:\tcall " helper "\n\t" \
- "jmp 1b\n" \
- LOCK_SECTION_END \
- ::"a" (rw) : "memory")
-
-#define __build_read_lock_const(rw, helper) \
- asm volatile(LOCK_PREFIX "subl $1,%0\n\t" \
- "js 2f\n" \
- "1:\n" \
- LOCK_SECTION_START("") \
- "2:\tpushq %%rax\n\t" \
- "leaq %0,%%rax\n\t" \
- "call " helper "\n\t" \
- "popq %%rax\n\t" \
- "jmp 1b\n" \
- LOCK_SECTION_END \
- :"=m" (*((volatile int *)rw))::"memory")
-
-#define __build_read_lock(rw, helper) do { \
- if (__builtin_constant_p(rw)) \
- __build_read_lock_const(rw, helper); \
- else \
- __build_read_lock_ptr(rw, helper); \
- } while (0)
-
-#define __build_write_lock_ptr(rw, helper) \
- asm volatile(LOCK_PREFIX "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \
- "jnz 2f\n" \
- "1:\n" \
- LOCK_SECTION_START("") \
- "2:\tcall " helper "\n\t" \
- "jmp 1b\n" \
- LOCK_SECTION_END \
- ::"a" (rw) : "memory")
-
-#define __build_write_lock_const(rw, helper) \
- asm volatile(LOCK_PREFIX "subl $" RW_LOCK_BIAS_STR ",%0\n\t" \
- "jnz 2f\n" \
- "1:\n" \
- LOCK_SECTION_START("") \
- "2:\tpushq %%rax\n\t" \
- "leaq %0,%%rax\n\t" \
- "call " helper "\n\t" \
- "popq %%rax\n\t" \
- "jmp 1b\n" \
- LOCK_SECTION_END \
- :"=m" (*((volatile long *)rw))::"memory")
-
-#define __build_write_lock(rw, helper) do { \
- if (__builtin_constant_p(rw)) \
- __build_write_lock_const(rw, helper); \
- else \
- __build_write_lock_ptr(rw, helper); \
- } while (0)
+/* Actual code is in asm/spinlock.h or in arch/x86_64/lib/rwlock.S */
#endif
diff --git a/include/asm-x86_64/segment.h b/include/asm-x86_64/segment.h
index d4bed33..334ddcd 100644
--- a/include/asm-x86_64/segment.h
+++ b/include/asm-x86_64/segment.h
@@ -20,15 +20,16 @@
#define __USER_CS 0x33 /* 6*8+3 */
#define __USER32_DS __USER_DS
-#define GDT_ENTRY_TLS 1
#define GDT_ENTRY_TSS 8 /* needs two entries */
#define GDT_ENTRY_LDT 10 /* needs two entries */
#define GDT_ENTRY_TLS_MIN 12
#define GDT_ENTRY_TLS_MAX 14
-/* 15 free */
#define GDT_ENTRY_TLS_ENTRIES 3
+#define GDT_ENTRY_PER_CPU 15 /* Abused to load per CPU data from limit */
+#define __PER_CPU_SEG (GDT_ENTRY_PER_CPU * 8 + 3)
+
/* TLS indexes for 64bit - hardcoded in arch_prctl */
#define FS_TLS 0
#define GS_TLS 1
diff --git a/include/asm-x86_64/semaphore.h b/include/asm-x86_64/semaphore.h
index 064df08..107bd90 100644
--- a/include/asm-x86_64/semaphore.h
+++ b/include/asm-x86_64/semaphore.h
@@ -107,12 +107,9 @@
__asm__ __volatile__(
"# atomic down operation\n\t"
LOCK_PREFIX "decl %0\n\t" /* --sem->count */
- "js 2f\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tcall __down_failed\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jns 1f\n\t"
+ "call __down_failed\n"
+ "1:"
:"=m" (sem->count)
:"D" (sem)
:"memory");
@@ -130,14 +127,11 @@
__asm__ __volatile__(
"# atomic interruptible down operation\n\t"
+ "xorl %0,%0\n\t"
LOCK_PREFIX "decl %1\n\t" /* --sem->count */
- "js 2f\n\t"
- "xorl %0,%0\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tcall __down_failed_interruptible\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jns 2f\n\t"
+ "call __down_failed_interruptible\n"
+ "2:\n"
:"=a" (result), "=m" (sem->count)
:"D" (sem)
:"memory");
@@ -154,14 +148,11 @@
__asm__ __volatile__(
"# atomic interruptible down operation\n\t"
+ "xorl %0,%0\n\t"
LOCK_PREFIX "decl %1\n\t" /* --sem->count */
- "js 2f\n\t"
- "xorl %0,%0\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tcall __down_failed_trylock\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jns 2f\n\t"
+ "call __down_failed_trylock\n\t"
+ "2:\n"
:"=a" (result), "=m" (sem->count)
:"D" (sem)
:"memory","cc");
@@ -179,12 +170,9 @@
__asm__ __volatile__(
"# atomic up operation\n\t"
LOCK_PREFIX "incl %0\n\t" /* ++sem->count */
- "jle 2f\n"
- "1:\n"
- LOCK_SECTION_START("")
- "2:\tcall __up_wakeup\n\t"
- "jmp 1b\n"
- LOCK_SECTION_END
+ "jg 1f\n\t"
+ "call __up_wakeup\n"
+ "1:"
:"=m" (sem->count)
:"D" (sem)
:"memory");
diff --git a/include/asm-x86_64/signal.h b/include/asm-x86_64/signal.h
index 3ede2a6..4581f97 100644
--- a/include/asm-x86_64/signal.h
+++ b/include/asm-x86_64/signal.h
@@ -24,10 +24,6 @@
} sigset_t;
-struct pt_regs;
-asmlinkage int do_signal(struct pt_regs *regs, sigset_t *oldset);
-
-
#else
/* Here we must cater to libcs that poke about in kernel headers. */
diff --git a/include/asm-x86_64/smp.h b/include/asm-x86_64/smp.h
index ce97f65..d6b7c05 100644
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -4,27 +4,18 @@
/*
* We need the APIC definitions automatically as part of 'smp.h'
*/
-#ifndef __ASSEMBLY__
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/bitops.h>
extern int disable_apic;
-#endif
-#ifdef CONFIG_X86_LOCAL_APIC
-#ifndef __ASSEMBLY__
#include <asm/fixmap.h>
#include <asm/mpspec.h>
-#ifdef CONFIG_X86_IO_APIC
#include <asm/io_apic.h>
-#endif
#include <asm/apic.h>
#include <asm/thread_info.h>
-#endif
-#endif
#ifdef CONFIG_SMP
-#ifndef ASSEMBLY
#include <asm/pda.h>
@@ -42,7 +33,6 @@
extern void smp_alloc_memory(void);
extern volatile unsigned long smp_invalidate_needed;
-extern int pic_mode;
extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
@@ -74,20 +64,16 @@
return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID));
}
-extern int safe_smp_processor_id(void);
extern int __cpu_disable(void);
extern void __cpu_die(unsigned int cpu);
extern void prefill_possible_map(void);
extern unsigned num_processors;
extern unsigned disabled_cpus;
-#endif /* !ASSEMBLY */
-
#define NO_PROC_ID 0xFF /* No processor magic marker */
#endif
-#ifndef ASSEMBLY
/*
* Some lowlevel functions might want to know about
* the real APIC ID <-> CPU # mapping.
@@ -109,11 +95,8 @@
return BAD_APICID;
}
-#endif /* !ASSEMBLY */
-
#ifndef CONFIG_SMP
#define stack_smp_processor_id() 0
-#define safe_smp_processor_id() 0
#define cpu_logical_map(x) (x)
#else
#include <asm/thread_info.h>
@@ -125,19 +108,23 @@
})
#endif
-#ifndef __ASSEMBLY__
static __inline int logical_smp_processor_id(void)
{
/* we don't want to mark this access volatile - bad code generation */
return GET_APIC_LOGICAL_ID(*(unsigned long *)(APIC_BASE+APIC_LDR));
}
-#endif
#ifdef CONFIG_SMP
#define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu]
#else
#define cpu_physical_id(cpu) boot_cpu_id
-#endif
-
+static inline int smp_call_function_single(int cpuid, void (*func) (void *info),
+ void *info, int retry, int wait)
+{
+ /* Disable interrupts here? */
+ func(info);
+ return 0;
+}
+#endif /* !CONFIG_SMP */
#endif
diff --git a/include/asm-x86_64/spinlock.h b/include/asm-x86_64/spinlock.h
index 248a79f..be7a9e6 100644
--- a/include/asm-x86_64/spinlock.h
+++ b/include/asm-x86_64/spinlock.h
@@ -16,31 +16,23 @@
* (the type definitions are in asm/spinlock_types.h)
*/
-#define __raw_spin_is_locked(x) \
- (*(volatile signed int *)(&(x)->slock) <= 0)
-
-#define __raw_spin_lock_string \
- "\n1:\t" \
- LOCK_PREFIX " ; decl %0\n\t" \
- "js 2f\n" \
- LOCK_SECTION_START("") \
- "2:\t" \
- "rep;nop\n\t" \
- "cmpl $0,%0\n\t" \
- "jle 2b\n\t" \
- "jmp 1b\n" \
- LOCK_SECTION_END
-
-#define __raw_spin_lock_string_up \
- "\n\tdecl %0"
-
-#define __raw_spin_unlock_string \
- "movl $1,%0" \
- :"=m" (lock->slock) : : "memory"
+static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
+{
+ return *(volatile signed int *)(&(lock)->slock) <= 0;
+}
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
- asm volatile(__raw_spin_lock_string : "=m" (lock->slock) : : "memory");
+ asm volatile(
+ "\n1:\t"
+ LOCK_PREFIX " ; decl %0\n\t"
+ "jns 2f\n"
+ "3:\n"
+ "rep;nop\n\t"
+ "cmpl $0,%0\n\t"
+ "jle 3b\n\t"
+ "jmp 1b\n"
+ "2:\t" : "=m" (lock->slock) : : "memory");
}
#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
@@ -49,7 +41,7 @@
{
int oldval;
- __asm__ __volatile__(
+ asm volatile(
"xchgl %0,%1"
:"=q" (oldval), "=m" (lock->slock)
:"0" (0) : "memory");
@@ -59,13 +51,14 @@
static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
- __asm__ __volatile__(
- __raw_spin_unlock_string
- );
+ asm volatile("movl $1,%0" :"=m" (lock->slock) :: "memory");
}
-#define __raw_spin_unlock_wait(lock) \
- do { while (__raw_spin_is_locked(lock)) cpu_relax(); } while (0)
+static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
+{
+ while (__raw_spin_is_locked(lock))
+ cpu_relax();
+}
/*
* Read-write spinlocks, allowing multiple readers
@@ -79,26 +72,34 @@
*
* On x86, we implement read-write locks as a 32-bit counter
* with the high bit (sign) being the "contended" bit.
- *
- * The inline assembly is non-obvious. Think about it.
- *
- * Changed to use the same technique as rw semaphores. See
- * semaphore.h for details. -ben
- *
- * the helpers are in arch/i386/kernel/semaphore.c
*/
-#define __raw_read_can_lock(x) ((int)(x)->lock > 0)
-#define __raw_write_can_lock(x) ((x)->lock == RW_LOCK_BIAS)
+static inline int __raw_read_can_lock(raw_rwlock_t *lock)
+{
+ return (int)(lock)->lock > 0;
+}
+
+static inline int __raw_write_can_lock(raw_rwlock_t *lock)
+{
+ return (lock)->lock == RW_LOCK_BIAS;
+}
static inline void __raw_read_lock(raw_rwlock_t *rw)
{
- __build_read_lock(rw, "__read_lock_failed");
+ asm volatile(LOCK_PREFIX "subl $1,(%0)\n\t"
+ "jns 1f\n"
+ "call __read_lock_failed\n"
+ "1:\n"
+ ::"D" (rw), "i" (RW_LOCK_BIAS) : "memory");
}
static inline void __raw_write_lock(raw_rwlock_t *rw)
{
- __build_write_lock(rw, "__write_lock_failed");
+ asm volatile(LOCK_PREFIX "subl %1,(%0)\n\t"
+ "jz 1f\n"
+ "\tcall __write_lock_failed\n\t"
+ "1:\n"
+ ::"D" (rw), "i" (RW_LOCK_BIAS) : "memory");
}
static inline int __raw_read_trylock(raw_rwlock_t *lock)
diff --git a/include/asm-x86_64/stacktrace.h b/include/asm-x86_64/stacktrace.h
new file mode 100644
index 0000000..5eb9799
--- /dev/null
+++ b/include/asm-x86_64/stacktrace.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_STACKTRACE_H
+#define _ASM_STACKTRACE_H 1
+
+/* Generic stack tracer with callbacks */
+
+struct stacktrace_ops {
+ void (*warning)(void *data, char *msg);
+ /* msg must contain %s for the symbol */
+ void (*warning_symbol)(void *data, char *msg, unsigned long symbol);
+ void (*address)(void *data, unsigned long address);
+ /* On negative return stop dumping */
+ int (*stack)(void *data, char *name);
+};
+
+void dump_trace(struct task_struct *tsk, struct pt_regs *regs, unsigned long *stack,
+ struct stacktrace_ops *ops, void *data);
+
+#endif
diff --git a/include/asm-x86_64/system.h b/include/asm-x86_64/system.h
index 6bf170b..bd376bc 100644
--- a/include/asm-x86_64/system.h
+++ b/include/asm-x86_64/system.h
@@ -14,12 +14,13 @@
#define __RESTORE(reg,offset) "movq (14-" #offset ")*8(%%rsp),%%" #reg "\n\t"
/* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\n\t"
+#define SAVE_CONTEXT "pushf ; pushq %%rbp ; movq %%rsi,%%rbp\n\t"
+#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp ; popf\t"
#define __EXTRA_CLOBBER \
,"rcx","rbx","rdx","r8","r9","r10","r11","r12","r13","r14","r15"
+/* Save restore flags to clear handle leaking NT */
#define switch_to(prev,next,last) \
asm volatile(SAVE_CONTEXT \
"movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */ \
diff --git a/include/asm-x86_64/tce.h b/include/asm-x86_64/tce.h
index 53e9a68..dbb047f 100644
--- a/include/asm-x86_64/tce.h
+++ b/include/asm-x86_64/tce.h
@@ -24,7 +24,6 @@
#ifndef _ASM_X86_64_TCE_H
#define _ASM_X86_64_TCE_H
-extern void* tce_table_kva[];
extern unsigned int specified_table_size;
struct iommu_table;
diff --git a/include/asm-x86_64/therm_throt.h b/include/asm-x86_64/therm_throt.h
new file mode 100644
index 0000000..5aac059
--- /dev/null
+++ b/include/asm-x86_64/therm_throt.h
@@ -0,0 +1 @@
+#include <asm-i386/therm_throt.h>
diff --git a/include/asm-x86_64/thread_info.h b/include/asm-x86_64/thread_info.h
index 2029b00..787a081 100644
--- a/include/asm-x86_64/thread_info.h
+++ b/include/asm-x86_64/thread_info.h
@@ -114,11 +114,14 @@
#define TIF_IRET 5 /* force IRET */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SECCOMP 8 /* secure computing */
+#define TIF_RESTORE_SIGMASK 9 /* restore signal mask in do_signal */
/* 16 free */
#define TIF_IA32 17 /* 32bit process */
#define TIF_FORK 18 /* ret_from_fork */
#define TIF_ABI_PENDING 19
#define TIF_MEMDIE 20
+#define TIF_DEBUG 21 /* uses debug registers */
+#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
@@ -128,9 +131,12 @@
#define _TIF_IRET (1<<TIF_IRET)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1<<TIF_SECCOMP)
+#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_IA32 (1<<TIF_IA32)
#define _TIF_FORK (1<<TIF_FORK)
#define _TIF_ABI_PENDING (1<<TIF_ABI_PENDING)
+#define _TIF_DEBUG (1<<TIF_DEBUG)
+#define _TIF_IO_BITMAP (1<<TIF_IO_BITMAP)
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
@@ -138,6 +144,9 @@
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK (0x0000FFFF & ~_TIF_SECCOMP)
+/* flags to check in __switch_to() */
+#define _TIF_WORK_CTXSW (_TIF_DEBUG|_TIF_IO_BITMAP)
+
#define PREEMPT_ACTIVE 0x10000000
/*
diff --git a/include/asm-x86_64/tlbflush.h b/include/asm-x86_64/tlbflush.h
index d16d5b6..983bd29 100644
--- a/include/asm-x86_64/tlbflush.h
+++ b/include/asm-x86_64/tlbflush.h
@@ -4,44 +4,44 @@
#include <linux/mm.h>
#include <asm/processor.h>
-#define __flush_tlb() \
- do { \
- unsigned long tmpreg; \
- \
- __asm__ __volatile__( \
- "movq %%cr3, %0; # flush TLB \n" \
- "movq %0, %%cr3; \n" \
- : "=r" (tmpreg) \
- :: "memory"); \
- } while (0)
+static inline unsigned long get_cr3(void)
+{
+ unsigned long cr3;
+ asm volatile("mov %%cr3,%0" : "=r" (cr3));
+ return cr3;
+}
-/*
- * Global pages have to be flushed a bit differently. Not a real
- * performance problem because this does not happen often.
- */
-#define __flush_tlb_global() \
- do { \
- unsigned long tmpreg, cr4, cr4_orig; \
- \
- __asm__ __volatile__( \
- "movq %%cr4, %2; # turn off PGE \n" \
- "movq %2, %1; \n" \
- "andq %3, %1; \n" \
- "movq %1, %%cr4; \n" \
- "movq %%cr3, %0; # flush TLB \n" \
- "movq %0, %%cr3; \n" \
- "movq %2, %%cr4; # turn PGE back on \n" \
- : "=&r" (tmpreg), "=&r" (cr4), "=&r" (cr4_orig) \
- : "i" (~X86_CR4_PGE) \
- : "memory"); \
- } while (0)
+static inline void set_cr3(unsigned long cr3)
+{
+ asm volatile("mov %0,%%cr3" :: "r" (cr3) : "memory");
+}
-extern unsigned long pgkern_mask;
+static inline void __flush_tlb(void)
+{
+ set_cr3(get_cr3());
+}
-#define __flush_tlb_all() __flush_tlb_global()
+static inline unsigned long get_cr4(void)
+{
+ unsigned long cr4;
+ asm volatile("mov %%cr4,%0" : "=r" (cr4));
+ return cr4;
+}
+
+static inline void set_cr4(unsigned long cr4)
+{
+ asm volatile("mov %0,%%cr4" :: "r" (cr4) : "memory");
+}
+
+static inline void __flush_tlb_all(void)
+{
+ unsigned long cr4 = get_cr4();
+ set_cr4(cr4 & ~X86_CR4_PGE); /* clear PGE */
+ set_cr4(cr4); /* write old PGE again and flush TLBs */
+}
#define __flush_tlb_one(addr) \
- __asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
+ __asm__ __volatile__("invlpg (%0)" :: "r" (addr) : "memory")
/*
diff --git a/include/asm-x86_64/uaccess.h b/include/asm-x86_64/uaccess.h
index 1e1fa00..e856570 100644
--- a/include/asm-x86_64/uaccess.h
+++ b/include/asm-x86_64/uaccess.h
@@ -84,7 +84,7 @@
*/
#define __get_user_x(size,ret,x,ptr) \
- __asm__ __volatile__("call __get_user_" #size \
+ asm volatile("call __get_user_" #size \
:"=a" (ret),"=d" (x) \
:"c" (ptr) \
:"r8")
@@ -101,7 +101,7 @@
case 8: __get_user_x(8,__ret_gu,__val_gu,ptr); break; \
default: __get_user_bad(); break; \
} \
- (x) = (__typeof__(*(ptr)))__val_gu; \
+ (x) = (typeof(*(ptr)))__val_gu; \
__ret_gu; \
})
@@ -112,7 +112,7 @@
extern void __put_user_bad(void);
#define __put_user_x(size,ret,x,ptr) \
- __asm__ __volatile__("call __put_user_" #size \
+ asm volatile("call __put_user_" #size \
:"=a" (ret) \
:"c" (ptr),"d" (x) \
:"r8")
@@ -139,7 +139,7 @@
#define __put_user_check(x,ptr,size) \
({ \
int __pu_err; \
- __typeof__(*(ptr)) __user *__pu_addr = (ptr); \
+ typeof(*(ptr)) __user *__pu_addr = (ptr); \
switch (size) { \
case 1: __put_user_x(1,__pu_err,x,__pu_addr); break; \
case 2: __put_user_x(2,__pu_err,x,__pu_addr); break; \
@@ -173,7 +173,7 @@
* aliasing issues.
*/
#define __put_user_asm(x, addr, err, itype, rtype, ltype, errno) \
- __asm__ __volatile__( \
+ asm volatile( \
"1: mov"itype" %"rtype"1,%2\n" \
"2:\n" \
".section .fixup,\"ax\"\n" \
@@ -193,7 +193,7 @@
int __gu_err; \
unsigned long __gu_val; \
__get_user_size(__gu_val,(ptr),(size),__gu_err); \
- (x) = (__typeof__(*(ptr)))__gu_val; \
+ (x) = (typeof(*(ptr)))__gu_val; \
__gu_err; \
})
@@ -217,7 +217,7 @@
} while (0)
#define __get_user_asm(x, addr, err, itype, rtype, ltype, errno) \
- __asm__ __volatile__( \
+ asm volatile( \
"1: mov"itype" %2,%"rtype"1\n" \
"2:\n" \
".section .fixup,\"ax\"\n" \
@@ -237,15 +237,20 @@
*/
/* Handles exceptions in both to and from, but doesn't do access_ok */
-extern unsigned long copy_user_generic(void *to, const void *from, unsigned len);
+__must_check unsigned long
+copy_user_generic(void *to, const void *from, unsigned len);
-extern unsigned long copy_to_user(void __user *to, const void *from, unsigned len);
-extern unsigned long copy_from_user(void *to, const void __user *from, unsigned len);
-extern unsigned long copy_in_user(void __user *to, const void __user *from, unsigned len);
+__must_check unsigned long
+copy_to_user(void __user *to, const void *from, unsigned len);
+__must_check unsigned long
+copy_from_user(void *to, const void __user *from, unsigned len);
+__must_check unsigned long
+copy_in_user(void __user *to, const void __user *from, unsigned len);
-static __always_inline int __copy_from_user(void *dst, const void __user *src, unsigned size)
+static __always_inline __must_check
+int __copy_from_user(void *dst, const void __user *src, unsigned size)
{
- int ret = 0;
+ int ret = 0;
if (!__builtin_constant_p(size))
return copy_user_generic(dst,(__force void *)src,size);
switch (size) {
@@ -272,9 +277,10 @@
}
}
-static __always_inline int __copy_to_user(void __user *dst, const void *src, unsigned size)
+static __always_inline __must_check
+int __copy_to_user(void __user *dst, const void *src, unsigned size)
{
- int ret = 0;
+ int ret = 0;
if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst,src,size);
switch (size) {
@@ -303,10 +309,10 @@
}
}
-
-static __always_inline int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
+static __always_inline __must_check
+int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
{
- int ret = 0;
+ int ret = 0;
if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst,(__force void *)src,size);
switch (size) {
@@ -344,15 +350,17 @@
}
}
-long strncpy_from_user(char *dst, const char __user *src, long count);
-long __strncpy_from_user(char *dst, const char __user *src, long count);
-long strnlen_user(const char __user *str, long n);
-long __strnlen_user(const char __user *str, long n);
-long strlen_user(const char __user *str);
-unsigned long clear_user(void __user *mem, unsigned long len);
-unsigned long __clear_user(void __user *mem, unsigned long len);
+__must_check long
+strncpy_from_user(char *dst, const char __user *src, long count);
+__must_check long
+__strncpy_from_user(char *dst, const char __user *src, long count);
+__must_check long strnlen_user(const char __user *str, long n);
+__must_check long __strnlen_user(const char __user *str, long n);
+__must_check long strlen_user(const char __user *str);
+__must_check unsigned long clear_user(void __user *mem, unsigned long len);
+__must_check unsigned long __clear_user(void __user *mem, unsigned long len);
-#define __copy_to_user_inatomic __copy_to_user
-#define __copy_from_user_inatomic __copy_from_user
+__must_check long __copy_from_user_inatomic(void *dst, const void __user *src, unsigned size);
+#define __copy_to_user_inatomic copy_user_generic
#endif /* __X86_64_UACCESS_H */
diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h
index 80fd48e..eeb98c1 100644
--- a/include/asm-x86_64/unistd.h
+++ b/include/asm-x86_64/unistd.h
@@ -600,9 +600,9 @@
#define __NR_faccessat 269
__SYSCALL(__NR_faccessat, sys_faccessat)
#define __NR_pselect6 270
-__SYSCALL(__NR_pselect6, sys_ni_syscall) /* for now */
+__SYSCALL(__NR_pselect6, sys_pselect6)
#define __NR_ppoll 271
-__SYSCALL(__NR_ppoll, sys_ni_syscall) /* for now */
+__SYSCALL(__NR_ppoll, sys_ppoll)
#define __NR_unshare 272
__SYSCALL(__NR_unshare, sys_unshare)
#define __NR_set_robust_list 273
@@ -658,6 +658,7 @@
#define __ARCH_WANT_SYS_SIGPENDING
#define __ARCH_WANT_SYS_SIGPROCMASK
#define __ARCH_WANT_SYS_RT_SIGACTION
+#define __ARCH_WANT_SYS_RT_SIGSUSPEND
#define __ARCH_WANT_SYS_TIME
#define __ARCH_WANT_COMPAT_SYS_TIME
diff --git a/include/asm-x86_64/unwind.h b/include/asm-x86_64/unwind.h
index 1f6e9bf..2e7ff10 100644
--- a/include/asm-x86_64/unwind.h
+++ b/include/asm-x86_64/unwind.h
@@ -18,6 +18,7 @@
{
struct pt_regs regs;
struct task_struct *task;
+ unsigned call_frame:1;
};
#define UNW_PC(frame) (frame)->regs.rip
@@ -57,6 +58,10 @@
PTREGS_INFO(r15), \
PTREGS_INFO(rip)
+#define UNW_DEFAULT_RA(raItem, dataAlign) \
+ ((raItem).where == Memory && \
+ !((raItem).value * (dataAlign) + 8))
+
static inline void arch_unw_init_frame_info(struct unwind_frame_info *info,
/*const*/ struct pt_regs *regs)
{
@@ -94,8 +99,8 @@
#else
-#define UNW_PC(frame) ((void)(frame), 0)
-#define UNW_SP(frame) ((void)(frame), 0)
+#define UNW_PC(frame) ((void)(frame), 0UL)
+#define UNW_SP(frame) ((void)(frame), 0UL)
static inline int arch_unw_user_mode(const void *info)
{
diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h
index 146b244..2281e93 100644
--- a/include/asm-x86_64/vsyscall.h
+++ b/include/asm-x86_64/vsyscall.h
@@ -4,6 +4,7 @@
enum vsyscall_num {
__NR_vgettimeofday,
__NR_vtime,
+ __NR_vgetcpu,
};
#define VSYSCALL_START (-10UL << 20)
@@ -15,6 +16,7 @@
#include <linux/seqlock.h>
#define __section_vxtime __attribute__ ((unused, __section__ (".vxtime"), aligned(16)))
+#define __section_vgetcpu_mode __attribute__ ((unused, __section__ (".vgetcpu_mode"), aligned(16)))
#define __section_wall_jiffies __attribute__ ((unused, __section__ (".wall_jiffies"), aligned(16)))
#define __section_jiffies __attribute__ ((unused, __section__ (".jiffies"), aligned(16)))
#define __section_sys_tz __attribute__ ((unused, __section__ (".sys_tz"), aligned(16)))
@@ -26,6 +28,9 @@
#define VXTIME_HPET 2
#define VXTIME_PMTMR 3
+#define VGETCPU_RDTSCP 1
+#define VGETCPU_LSL 2
+
struct vxtime_data {
long hpet_address; /* HPET base address */
int last;
@@ -40,6 +45,7 @@
/* vsyscall space (readonly) */
extern struct vxtime_data __vxtime;
+extern int __vgetcpu_mode;
extern struct timespec __xtime;
extern volatile unsigned long __jiffies;
extern unsigned long __wall_jiffies;
@@ -48,6 +54,7 @@
/* kernel space (writeable) */
extern struct vxtime_data vxtime;
+extern int vgetcpu_mode;
extern unsigned long wall_jiffies;
extern struct timezone sys_tz;
extern int sysctl_vsyscall;
@@ -55,6 +62,8 @@
extern int sysctl_vsyscall;
+extern void vsyscall_set_cpu(int cpu);
+
#define ARCH_HAVE_XTIME_LOCK 1
#endif /* __KERNEL__ */
diff --git a/include/linux/edd.h b/include/linux/edd.h
index 162512b..b2b3e68 100644
--- a/include/linux/edd.h
+++ b/include/linux/edd.h
@@ -52,6 +52,7 @@
#define EDD_CL_EQUALS 0x3d646465 /* "edd=" */
#define EDD_CL_OFF 0x666f /* "of" for off */
#define EDD_CL_SKIP 0x6b73 /* "sk" for skipmbr */
+#define EDD_CL_ON 0x6e6f /* "on" for on */
#ifndef __ASSEMBLY__
diff --git a/include/linux/getcpu.h b/include/linux/getcpu.h
new file mode 100644
index 0000000..031ed37
--- /dev/null
+++ b/include/linux/getcpu.h
@@ -0,0 +1,16 @@
+#ifndef _LINUX_GETCPU_H
+#define _LINUX_GETCPU_H 1
+
+/* Cache for getcpu() to speed it up. Results might be upto a jiffie
+ out of date, but will be faster.
+ User programs should not refer to the contents of this structure.
+ It is only a cache for vgetcpu(). It might change in future kernels.
+ The user program must store this information per thread (__thread)
+ If you want 100% accurate information pass NULL instead. */
+struct getcpu_cache {
+ unsigned long t0;
+ unsigned long t1;
+ unsigned long res[4];
+};
+
+#endif
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 329ebcf..c8d5f20 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -115,6 +115,21 @@
((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)
+/* Same as above, but does so with platform independent 64bit types.
+ * These must be used when utilizing jiffies_64 (i.e. return value of
+ * get_jiffies_64() */
+#define time_after64(a,b) \
+ (typecheck(__u64, a) && \
+ typecheck(__u64, b) && \
+ ((__s64)(b) - (__s64)(a) < 0))
+#define time_before64(a,b) time_after64(b,a)
+
+#define time_after_eq64(a,b) \
+ (typecheck(__u64, a) && \
+ typecheck(__u64, b) && \
+ ((__s64)(a) - (__s64)(b) >= 0))
+#define time_before_eq64(a,b) time_after_eq64(b,a)
+
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e44a37e..4fa373b 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -187,6 +187,7 @@
extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
extern int panic_timeout;
extern int panic_on_oops;
+extern int panic_on_unrecovered_nmi;
extern int tainted;
extern const char *print_tainted(void);
extern void add_taint(unsigned);
diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index 932021f..6c9873f 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -35,9 +35,13 @@
#endif
#define KPROBE_ENTRY(name) \
- .section .kprobes.text, "ax"; \
+ .pushsection .kprobes.text, "ax"; \
ENTRY(name)
+#define KPROBE_END(name) \
+ END(name); \
+ .popsection
+
#ifndef END
#define END(name) \
.size name, .-name
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 34ed0d9..9d4aa7f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -819,6 +819,11 @@
unsigned did_exec:1;
pid_t pid;
pid_t tgid;
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+ /* Canary value for the -fstack-protector gcc feature */
+ unsigned long stack_canary;
+#endif
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
@@ -865,6 +870,15 @@
struct key *thread_keyring; /* keyring private to this thread */
unsigned char jit_keyring; /* default keyring to attach requested keys to */
#endif
+ /*
+ * fpu_counter contains the number of consecutive context switches
+ * that the FPU is used. If this is over a threshold, the lazy fpu
+ * saving becomes unlazy to save the trap. This is an unsigned char
+ * so that after 256 times the counter wraps and the behavior turns
+ * lazy again; this to deal with bursty apps that only use FPU for
+ * a short time
+ */
+ unsigned char fpu_counter;
int oomkilladj; /* OOM kill score adjustment (bit shift). */
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 9cc81e5..50e2b01 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -5,15 +5,16 @@
struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;
+ int skip; /* input argument: How many entries to skip */
+ int all_contexts; /* input argument: if true do than one stack */
};
extern void save_stack_trace(struct stack_trace *trace,
- struct task_struct *task, int all_contexts,
- unsigned int skip);
+ struct task_struct *task);
extern void print_stack_trace(struct stack_trace *trace, int spaces);
#else
-# define save_stack_trace(trace, task, all, skip) do { } while (0)
+# define save_stack_trace(trace, task) do { } while (0)
# define print_stack_trace(trace) do { } while (0)
#endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 008f04c..3f0f716 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -53,6 +53,7 @@
struct compat_stat;
struct compat_timeval;
struct robust_list_head;
+struct getcpu_cache;
#include <linux/types.h>
#include <linux/aio_abi.h>
@@ -596,5 +597,6 @@
size_t __user *len_ptr);
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);
+asmlinkage long sys_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *cache);
#endif
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index eca5557..1b24bd4 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -150,6 +150,8 @@
KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
KERN_COMPAT_LOG=73, /* int: print compat layer messages */
KERN_MAX_LOCK_DEPTH=74,
+ KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
+ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
};
diff --git a/include/linux/vermagic.h b/include/linux/vermagic.h
index 46919f9..4d0909e 100644
--- a/include/linux/vermagic.h
+++ b/include/linux/vermagic.h
@@ -24,5 +24,5 @@
#define VERMAGIC_STRING \
UTS_RELEASE " " \
MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \
- MODULE_VERMAGIC_MODULE_UNLOAD MODULE_ARCH_VERMAGIC \
- "gcc-" __stringify(__GNUC__) "." __stringify(__GNUC_MINOR__)
+ MODULE_VERMAGIC_MODULE_UNLOAD MODULE_ARCH_VERMAGIC
+
diff --git a/init/main.c b/init/main.c
index 8651a72..913e48d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -162,16 +162,19 @@
static int __init obsolete_checksetup(char *line)
{
struct obs_kernel_param *p;
+ int had_early_param = 0;
p = __setup_start;
do {
int n = strlen(p->str);
if (!strncmp(line, p->str, n)) {
if (p->early) {
- /* Already done in parse_early_param? (Needs
- * exact match on param part) */
+ /* Already done in parse_early_param?
+ * (Needs exact match on param part).
+ * Keep iterating, as we can have early
+ * params and __setups of same names 8( */
if (line[n] == '\0' || line[n] == '=')
- return 1;
+ had_early_param = 1;
} else if (!p->setup_func) {
printk(KERN_WARNING "Parameter %s is obsolete,"
" ignored\n", p->str);
@@ -181,7 +184,8 @@
}
p++;
} while (p < __setup_end);
- return 0;
+
+ return had_early_param;
}
/*
@@ -464,6 +468,7 @@
* Need to run as early as possible, to initialize the
* lockdep hash:
*/
+ unwind_init();
lockdep_init();
local_irq_disable();
@@ -502,7 +507,6 @@
__stop___param - __start___param,
&unknown_bootoption);
sort_main_extable();
- unwind_init();
trap_init();
rcu_init();
init_IRQ();
diff --git a/kernel/fork.c b/kernel/fork.c
index f9b014e..a0dad84 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -45,6 +45,7 @@
#include <linux/cn_proc.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
+#include <linux/random.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -175,6 +176,10 @@
tsk->thread_info = ti;
setup_thread_stack(tsk, orig);
+#ifdef CONFIG_CC_STACKPROTECTOR
+ tsk->stack_canary = get_random_int();
+#endif
+
/* One for us, one for whoever does the "release_task()" (usually parent) */
atomic_set(&tsk->usage,2);
atomic_set(&tsk->fs_excl, 0);
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 9bad178..c088e55 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -224,7 +224,14 @@
trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
trace->entries = stack_trace + nr_stack_trace_entries;
- save_stack_trace(trace, NULL, 0, 3);
+ trace->skip = 3;
+ trace->all_contexts = 0;
+
+ /* Make sure to not recurse in case the the unwinder needs to tak
+e locks. */
+ lockdep_off();
+ save_stack_trace(trace, NULL);
+ lockdep_on();
trace->max_entries = trace->nr_entries;
diff --git a/kernel/panic.c b/kernel/panic.c
index 8010b9b..6ceb664 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -21,6 +21,7 @@
#include <linux/debug_locks.h>
int panic_on_oops;
+int panic_on_unrecovered_nmi;
int tainted;
static int pause_on_oops;
static int pause_on_oops_flag;
@@ -270,3 +271,15 @@
{
do_oops_enter_exit();
}
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+/*
+ * Called when gcc's -fstack-protector feature is used, and
+ * gcc detects corruption of the on-stack canary value
+ */
+void __stack_chk_fail(void)
+{
+ panic("stack-protector: Kernel stack is corrupted");
+}
+EXPORT_SYMBOL(__stack_chk_fail);
+#endif
diff --git a/kernel/spinlock.c b/kernel/spinlock.c
index fb524b0..9644a41 100644
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -7,6 +7,11 @@
*
* This file contains the spinlock/rwlock implementations for the
* SMP and the DEBUG_SPINLOCK cases. (UP-nondebug inlines them)
+ *
+ * Note that some architectures have special knowledge about the
+ * stack frames of these functions in their profile_pc. If you
+ * change anything significant here that could change the stack
+ * frame contact the architecture maintainers.
*/
#include <linux/linkage.h>
diff --git a/kernel/sys.c b/kernel/sys.c
index e236f98..3f89477 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -28,6 +28,7 @@
#include <linux/tty.h>
#include <linux/signal.h>
#include <linux/cn_proc.h>
+#include <linux/getcpu.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -2062,3 +2063,33 @@
}
return error;
}
+
+asmlinkage long sys_getcpu(unsigned __user *cpup, unsigned __user *nodep,
+ struct getcpu_cache __user *cache)
+{
+ int err = 0;
+ int cpu = raw_smp_processor_id();
+ if (cpup)
+ err |= put_user(cpu, cpup);
+ if (nodep)
+ err |= put_user(cpu_to_node(cpu), nodep);
+ if (cache) {
+ /*
+ * The cache is not needed for this implementation,
+ * but make sure user programs pass something
+ * valid. vsyscall implementations can instead make
+ * good use of the cache. Only use t0 and t1 because
+ * these are available in both 32bit and 64bit ABI (no
+ * need for a compat_getcpu). 32bit has enough
+ * padding
+ */
+ unsigned long t0, t1;
+ get_user(t0, &cache->t0);
+ get_user(t1, &cache->t1);
+ t0++;
+ t1++;
+ put_user(t0, &cache->t0);
+ put_user(t1, &cache->t1);
+ }
+ return err ? -EFAULT : 0;
+}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index fd43c3e..bcb3a18 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -76,8 +76,9 @@
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
int unknown_nmi_panic;
-extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
- void __user *, size_t *, loff_t *);
+int nmi_watchdog_enabled;
+extern int proc_nmi_enabled(struct ctl_table *, int , struct file *,
+ void __user *, size_t *, loff_t *);
#endif
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
@@ -628,11 +629,27 @@
.data = &unknown_nmi_panic,
.maxlen = sizeof (int),
.mode = 0644,
- .proc_handler = &proc_unknown_nmi_panic,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = KERN_NMI_WATCHDOG,
+ .procname = "nmi_watchdog",
+ .data = &nmi_watchdog_enabled,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_nmi_enabled,
},
#endif
#if defined(CONFIG_X86)
{
+ .ctl_name = KERN_PANIC_ON_NMI,
+ .procname = "panic_on_unrecovered_nmi",
+ .data = &panic_on_unrecovered_nmi,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
.ctl_name = KERN_BOOTLOADER_TYPE,
.procname = "bootloader_type",
.data = &bootloader_type,
diff --git a/kernel/unwind.c b/kernel/unwind.c
index f69c804..3430475 100644
--- a/kernel/unwind.c
+++ b/kernel/unwind.c
@@ -603,6 +603,7 @@
#define FRAME_REG(r, t) (((t *)frame)[reg_info[r].offs])
const u32 *fde = NULL, *cie = NULL;
const u8 *ptr = NULL, *end = NULL;
+ unsigned long pc = UNW_PC(frame) - frame->call_frame;
unsigned long startLoc = 0, endLoc = 0, cfa;
unsigned i;
signed ptrType = -1;
@@ -612,7 +613,7 @@
if (UNW_PC(frame) == 0)
return -EINVAL;
- if ((table = find_table(UNW_PC(frame))) != NULL
+ if ((table = find_table(pc)) != NULL
&& !(table->size & (sizeof(*fde) - 1))) {
unsigned long tableSize = table->size;
@@ -647,7 +648,7 @@
ptrType & DW_EH_PE_indirect
? ptrType
: ptrType & (DW_EH_PE_FORM|DW_EH_PE_signed));
- if (UNW_PC(frame) >= startLoc && UNW_PC(frame) < endLoc)
+ if (pc >= startLoc && pc < endLoc)
break;
cie = NULL;
}
@@ -657,16 +658,28 @@
state.cieEnd = ptr; /* keep here temporarily */
ptr = (const u8 *)(cie + 2);
end = (const u8 *)(cie + 1) + *cie;
+ frame->call_frame = 1;
if ((state.version = *ptr) != 1)
cie = NULL; /* unsupported version */
else if (*++ptr) {
/* check if augmentation size is first (and thus present) */
if (*ptr == 'z') {
- /* check for ignorable (or already handled)
- * nul-terminated augmentation string */
- while (++ptr < end && *ptr)
- if (strchr("LPR", *ptr) == NULL)
+ while (++ptr < end && *ptr) {
+ switch(*ptr) {
+ /* check for ignorable (or already handled)
+ * nul-terminated augmentation string */
+ case 'L':
+ case 'P':
+ case 'R':
+ continue;
+ case 'S':
+ frame->call_frame = 0;
+ continue;
+ default:
break;
+ }
+ break;
+ }
}
if (ptr >= end || *ptr)
cie = NULL;
@@ -755,7 +768,7 @@
state.org = startLoc;
memcpy(&state.cfa, &badCFA, sizeof(state.cfa));
/* process instructions */
- if (!processCFI(ptr, end, UNW_PC(frame), ptrType, &state)
+ if (!processCFI(ptr, end, pc, ptrType, &state)
|| state.loc > endLoc
|| state.regs[retAddrReg].where == Nowhere
|| state.cfa.reg >= ARRAY_SIZE(reg_info)
@@ -763,6 +776,11 @@
|| state.cfa.offs % sizeof(unsigned long))
return -EIO;
/* update frame */
+#ifndef CONFIG_AS_CFI_SIGNAL_FRAME
+ if(frame->call_frame
+ && !UNW_DEFAULT_RA(state.regs[retAddrReg], state.dataAlign))
+ frame->call_frame = 0;
+#endif
cfa = FRAME_REG(state.cfa.reg, unsigned long) + state.cfa.offs;
startLoc = min((unsigned long)UNW_SP(frame), cfa);
endLoc = max((unsigned long)UNW_SP(frame), cfa);
@@ -866,6 +884,7 @@
/*const*/ struct pt_regs *regs)
{
info->task = tsk;
+ info->call_frame = 0;
arch_unw_init_frame_info(info, regs);
return 0;
@@ -879,6 +898,7 @@
struct task_struct *tsk)
{
info->task = tsk;
+ info->call_frame = 0;
arch_unw_init_blocked(info);
return 0;
@@ -894,6 +914,7 @@
void *arg)
{
info->task = current;
+ info->call_frame = 0;
return arch_unwind_init_running(info, callback, arg);
}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2869307c..f1ac318 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -225,7 +225,7 @@
bool
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
select STACKTRACE
- select FRAME_POINTER
+ select FRAME_POINTER if !X86
select KALLSYMS
select KALLSYMS_ALL
diff --git a/lib/hweight.c b/lib/hweight.c
index 4382576..360556a 100644
--- a/lib/hweight.c
+++ b/lib/hweight.c
@@ -1,5 +1,6 @@
#include <linux/module.h>
#include <asm/types.h>
+#include <asm/bitops.h>
/**
* hweightN - returns the hamming weight of a N-bit word
@@ -40,14 +41,19 @@
#if BITS_PER_LONG == 32
return hweight32((unsigned int)(w >> 32)) + hweight32((unsigned int)w);
#elif BITS_PER_LONG == 64
+#ifdef ARCH_HAS_FAST_MULTIPLIER
+ w -= (w >> 1) & 0x5555555555555555ul;
+ w = (w & 0x3333333333333333ul) + ((w >> 2) & 0x3333333333333333ul);
+ w = (w + (w >> 4)) & 0x0f0f0f0f0f0f0f0ful;
+ return (w * 0x0101010101010101ul) >> 56;
+#else
__u64 res = w - ((w >> 1) & 0x5555555555555555ul);
res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul);
res = (res + (res >> 4)) & 0x0F0F0F0F0F0F0F0Ful;
res = res + (res >> 8);
res = res + (res >> 16);
return (res + (res >> 32)) & 0x00000000000000FFul;
-#else
-#error BITS_PER_LONG not defined
+#endif
#endif
}
EXPORT_SYMBOL(hweight64);
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 3d52389..4f5ff19b 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -63,6 +63,13 @@
-xassembler /dev/null > /dev/null 2>&1; then echo "$(1)"; \
else echo "$(2)"; fi ;)
+# as-instr
+# Usage: cflags-y += $(call as-instr, instr, option1, option2)
+
+as-instr = $(shell if echo -e "$(1)" | $(AS) >/dev/null 2>&1 -W -Z -o astest$$$$.out ; \
+ then echo "$(2)"; else echo "$(3)"; fi; \
+ rm -f astest$$$$.out)
+
# cc-option
# Usage: cflags-y += $(call cc-option, -march=winchip-c6, -march=i586)
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
new file mode 100644
index 0000000..325c0a1
--- /dev/null
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+
+echo "int foo(void) { char X[200]; return 3; }" | $1 -S -xc -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+if [ "$?" -eq "0" ] ; then
+ echo $2
+fi