1================================== 2Memory Attribute Aliasing on IA-64 3================================== 4 5Bjorn Helgaas <bjorn.helgaas@hp.com> 6 7May 4, 2006 8 9 10Memory Attributes 11================= 12 13 Itanium supports several attributes for virtual memory references. 14 The attribute is part of the virtual translation, i.e., it is 15 contained in the TLB entry. The ones of most interest to the Linux 16 kernel are: 17 18 == ====================== 19 WB Write-back (cacheable) 20 UC Uncacheable 21 WC Write-coalescing 22 == ====================== 23 24 System memory typically uses the WB attribute. The UC attribute is 25 used for memory-mapped I/O devices. The WC attribute is uncacheable 26 like UC is, but writes may be delayed and combined to increase 27 performance for things like frame buffers. 28 29 The Itanium architecture requires that we avoid accessing the same 30 page with both a cacheable mapping and an uncacheable mapping[1]. 31 32 The design of the chipset determines which attributes are supported 33 on which regions of the address space. For example, some chipsets 34 support either WB or UC access to main memory, while others support 35 only WB access. 36 37Memory Map 38========== 39 40 Platform firmware describes the physical memory map and the 41 supported attributes for each region. At boot-time, the kernel uses 42 the EFI GetMemoryMap() interface. ACPI can also describe memory 43 devices and the attributes they support, but Linux/ia64 currently 44 doesn't use this information. 45 46 The kernel uses the efi_memmap table returned from GetMemoryMap() to 47 learn the attributes supported by each region of physical address 48 space. Unfortunately, this table does not completely describe the 49 address space because some machines omit some or all of the MMIO 50 regions from the map. 51 52 The kernel maintains another table, kern_memmap, which describes the 53 memory Linux is actually using and the attribute for each region. 54 This contains only system memory; it does not contain MMIO space. 55 56 The kern_memmap table typically contains only a subset of the system 57 memory described by the efi_memmap. Linux/ia64 can't use all memory 58 in the system because of constraints imposed by the identity mapping 59 scheme. 60 61 The efi_memmap table is preserved unmodified because the original 62 boot-time information is required for kexec. 63 64Kernel Identify Mappings 65======================== 66 67 Linux/ia64 identity mappings are done with large pages, currently 68 either 16MB or 64MB, referred to as "granules." Cacheable mappings 69 are speculative[2], so the processor can read any location in the 70 page at any time, independent of the programmer's intentions. This 71 means that to avoid attribute aliasing, Linux can create a cacheable 72 identity mapping only when the entire granule supports cacheable 73 access. 74 75 Therefore, kern_memmap contains only full granule-sized regions that 76 can referenced safely by an identity mapping. 77 78 Uncacheable mappings are not speculative, so the processor will 79 generate UC accesses only to locations explicitly referenced by 80 software. This allows UC identity mappings to cover granules that 81 are only partially populated, or populated with a combination of UC 82 and WB regions. 83 84User Mappings 85============= 86 87 User mappings are typically done with 16K or 64K pages. The smaller 88 page size allows more flexibility because only 16K or 64K has to be 89 homogeneous with respect to memory attributes. 90 91Potential Attribute Aliasing Cases 92================================== 93 94 There are several ways the kernel creates new mappings: 95 96mmap of /dev/mem 97---------------- 98 99 This uses remap_pfn_range(), which creates user mappings. These 100 mappings may be either WB or UC. If the region being mapped 101 happens to be in kern_memmap, meaning that it may also be mapped 102 by a kernel identity mapping, the user mapping must use the same 103 attribute as the kernel mapping. 104 105 If the region is not in kern_memmap, the user mapping should use 106 an attribute reported as being supported in the EFI memory map. 107 108 Since the EFI memory map does not describe MMIO on some 109 machines, this should use an uncacheable mapping as a fallback. 110 111mmap of /sys/class/pci_bus/.../legacy_mem 112----------------------------------------- 113 114 This is very similar to mmap of /dev/mem, except that legacy_mem 115 only allows mmap of the one megabyte "legacy MMIO" area for a 116 specific PCI bus. Typically this is the first megabyte of 117 physical address space, but it may be different on machines with 118 several VGA devices. 119 120 "X" uses this to access VGA frame buffers. Using legacy_mem 121 rather than /dev/mem allows multiple instances of X to talk to 122 different VGA cards. 123 124 The /dev/mem mmap constraints apply. 125 126mmap of /proc/bus/pci/.../??.? 127------------------------------ 128 129 This is an MMIO mmap of PCI functions, which additionally may or 130 may not be requested as using the WC attribute. 131 132 If WC is requested, and the region in kern_memmap is either WC 133 or UC, and the EFI memory map designates the region as WC, then 134 the WC mapping is allowed. 135 136 Otherwise, the user mapping must use the same attribute as the 137 kernel mapping. 138 139read/write of /dev/mem 140---------------------- 141 142 This uses copy_from_user(), which implicitly uses a kernel 143 identity mapping. This is obviously safe for things in 144 kern_memmap. 145 146 There may be corner cases of things that are not in kern_memmap, 147 but could be accessed this way. For example, registers in MMIO 148 space are not in kern_memmap, but could be accessed with a UC 149 mapping. This would not cause attribute aliasing. But 150 registers typically can be accessed only with four-byte or 151 eight-byte accesses, and the copy_from_user() path doesn't allow 152 any control over the access size, so this would be dangerous. 153 154ioremap() 155--------- 156 157 This returns a mapping for use inside the kernel. 158 159 If the region is in kern_memmap, we should use the attribute 160 specified there. 161 162 If the EFI memory map reports that the entire granule supports 163 WB, we should use that (granules that are partially reserved 164 or occupied by firmware do not appear in kern_memmap). 165 166 If the granule contains non-WB memory, but we can cover the 167 region safely with kernel page table mappings, we can use 168 ioremap_page_range() as most other architectures do. 169 170 Failing all of the above, we have to fall back to a UC mapping. 171 172Past Problem Cases 173================== 174 175mmap of various MMIO regions from /dev/mem by "X" on Intel platforms 176-------------------------------------------------------------------- 177 178 The EFI memory map may not report these MMIO regions. 179 180 These must be allowed so that X will work. This means that 181 when the EFI memory map is incomplete, every /dev/mem mmap must 182 succeed. It may create either WB or UC user mappings, depending 183 on whether the region is in kern_memmap or the EFI memory map. 184 185mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled 186---------------------------------------------------------------------- 187 188 The EFI memory map reports the following attributes: 189 190 =============== ======= ================== 191 0x00000-0x9FFFF WB only 192 0xA0000-0xBFFFF UC only (VGA frame buffer) 193 0xC0000-0xFFFFF WB only 194 =============== ======= ================== 195 196 This mmap is done with user pages, not kernel identity mappings, 197 so it is safe to use WB mappings. 198 199 The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, 200 which uses a granule-sized UC mapping. This granule will cover some 201 WB-only memory, but since UC is non-speculative, the processor will 202 never generate an uncacheable reference to the WB-only areas unless 203 the driver explicitly touches them. 204 205mmap of 0x0-0xFFFFF legacy_mem by "X" 206------------------------------------- 207 208 If the EFI memory map reports that the entire range supports the 209 same attributes, we can allow the mmap (and we will prefer WB if 210 supported, as is the case with HP sx[12]000 machines with VGA 211 disabled). 212 213 If EFI reports the range as partly WB and partly UC (as on sx[12]000 214 machines with VGA enabled), we must fail the mmap because there's no 215 safe attribute to use. 216 217 If EFI reports some of the range but not all (as on Intel firmware 218 that doesn't report the VGA frame buffer at all), we should fail the 219 mmap and force the user to map just the specific region of interest. 220 221mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled 222------------------------------------------------------------------------ 223 224 The EFI memory map reports the following attributes:: 225 226 0x00000-0xFFFFF WB only (no VGA MMIO hole) 227 228 This is a special case of the previous case, and the mmap should 229 fail for the same reason as above. 230 231read of /sys/devices/.../rom 232---------------------------- 233 234 For VGA devices, this may cause an ioremap() of 0xC0000. This 235 used to be done with a UC mapping, because the VGA frame buffer 236 at 0xA0000 prevents use of a WB granule. The UC mapping causes 237 an MCA on HP sx[12]000 chipsets. 238 239 We should use WB page table mappings to avoid covering the VGA 240 frame buffer. 241 242Notes 243===== 244 245 [1] SDM rev 2.2, vol 2, sec 4.4.1. 246 [2] SDM rev 2.2, vol 2, sec 4.4.6. 247