aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/asm-generic
AgeCommit message (Collapse)AuthorFilesLines
2019-02-28Add io_uring IO interfaceGravatar Jens Axboe 1-1/+5
The submission queue (SQ) and completion queue (CQ) rings are shared between the application and the kernel. This eliminates the need to copy data back and forth to submit and complete IO. IO submissions use the io_uring_sqe data structure, and completions are generated in the form of io_uring_cqe data structures. The SQ ring is an index into the io_uring_sqe array, which makes it possible to submit a batch of IOs without them being contiguous in the ring. The CQ ring is always contiguous, as completion events are inherently unordered, and hence any io_uring_cqe entry can point back to an arbitrary submission. Two new system calls are added for this: io_uring_setup(entries, params) Sets up an io_uring instance for doing async IO. On success, returns a file descriptor that the application can mmap to gain access to the SQ ring, CQ ring, and io_uring_sqes. io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize) Initiates IO against the rings mapped to this fd, or waits for them to complete, or both. The behavior is controlled by the parameters passed in. If 'to_submit' is non-zero, then we'll try and submit new IO. If IORING_ENTER_GETEVENTS is set, the kernel will wait for 'min_complete' events, if they aren't already available. It's valid to set IORING_ENTER_GETEVENTS and 'min_complete' == 0 at the same time, this allows the kernel to return already completed events without waiting for them. This is useful only for polling, as for IRQ driven IO, the application can just check the CQ ring without entering the kernel. With this setup, it's possible to do async IO with a single system call. Future developments will enable polled IO with this interface, and polled submission as well. The latter will enable an application to do IO without doing ANY system calls at all. For IRQ driven IO, an application only needs to enter the kernel for completions if it wants to wait for them to occur. Each io_uring is backed by a workqueue, to support buffered async IO as well. We will only punt to an async context if the command would need to wait for IO on the device side. Any data that can be accessed directly in the page cache is done inline. This avoids the slowness issue of usual threadpools, since cached data is accessed as quickly as a sync interface. Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-19asm-generic: Make time32 syscall numbers optionalGravatar Arnd Bergmann 1-0/+36
We don't want new architectures to even provide the old 32-bit time_t based system calls any more, or define the syscall number macros. Add a new __ARCH_WANT_TIME32_SYSCALLS macro that gets enabled for all existing 32-bit architectures using the generic system call table, so we don't change any current behavior. Since this symbol is evaluated in user space as well, we cannot use a Kconfig CONFIG_* macro but have to define it in uapi/asm/unistd.h. On 64-bit architectures, the same system call numbers mostly refer to the system calls we want to keep, as they already pass 64-bit time_t. As new architectures no longer provide these, we need new exceptions in checksyscalls.sh. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-19asm-generic: Drop getrlimit and setrlimit syscalls from default listGravatar Yury Norov 1-0/+5
The newer prlimit64 syscall provides all the functionality of getrlimit and setrlimit syscalls and adds the pid of target process, so future architectures won't need to include getrlimit and setrlimit. Therefore drop getrlimit and setrlimit syscalls from the generic syscall list unless __ARCH_WANT_SET_GET_RLIMIT is defined by the architecture's unistd.h prior to including asm-generic/unistd.h, and adjust all architectures using the generic syscall list to define it so that no in-tree architectures are affected. Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-hexagon@vger.kernel.org Cc: uclinux-h8-devel@lists.sourceforge.jp Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: James Hogan <james.hogan@imgtec.com> [metag] Acked-by: Ley Foon Tan <lftan@altera.com> [nios2] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Will Deacon <will.deacon@arm.com> [arm64] Acked-by: Vineet Gupta <vgupta@synopsys.com> #arch/arc bits Signed-off-by: Yury Norov <ynorov@marvell.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-18arch: move common mmap flags to linux/mman.hGravatar Michael S. Tsirkin 1-3/+1
Now that we have 3 mmap flags shared by all architectures, let's move them into the common header. This will help discourage future architectures from duplicating code. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-18compat ABI: use non-compat openat and open_by_handle_at variantsGravatar Yury Norov 1-3/+2
The only difference between native and compat openat and open_by_handle_at is that non-compat version forces O_LARGEFILE, and it should be the default behaviour for all architectures, as we are going to drop the support of 32-bit userspace off_t. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Yury Norov <ynorov@marvell.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGravatar David S. Miller 1-7/+0
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12Rename include/{uapi => }/asm-generic/shmparam.h reallyGravatar Masahiro Yamada 1-7/+0
Commit 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures") is different from the patch I submitted. My patch is this: https://lore.kernel.org/lkml/1546904307-11124-1-git-send-email-yamada.masahiro@socionext.com/T/#u The file renaming part: rename include/{uapi => }/asm-generic/shmparam.h (100%) was lost when it was picked up. I think it was an accident because Andrew did not say anything. Link: http://lkml.kernel.org/r/1549158277-24558-1-git-send-email-yamada.masahiro@socionext.com Fixes: 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-07y2038: add 64-bit time_t syscalls to all 32-bit architecturesGravatar Arnd Bergmann 1-1/+43
This adds 21 new system calls on each ABI that has 32-bit time_t today. All of these have the exact same semantics as their existing counterparts, and the new ones all have macro names that end in 'time64' for clarification. This gets us to the point of being able to safely use a C library that has 64-bit time_t in user space. There are still a couple of loose ends to tie up in various areas of the code, but this is the big one, and should be entirely uncontroversial at this point. In particular, there are four system calls (getitimer, setitimer, waitid, and getrusage) that don't have a 64-bit counterpart yet, but these can all be safely implemented in the C library by wrapping around the existing system calls because the 32-bit time_t they pass only counts elapsed time, not time since the epoch. They will be dealt with later. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-07y2038: use time32 syscall names on 32-bitGravatar Arnd Bergmann 1-28/+28
This is the big flip, where all 32-bit architectures set COMPAT_32BIT_TIME and use the _time32 system calls from the former compat layer instead of the system calls that take __kernel_timespec and similar arguments. The temporary redirects for __kernel_timespec, __kernel_itimerspec and __kernel_timex can get removed with this. It would be easy to split this commit by architecture, but with the new generated system call tables, it's easy enough to do it all at once, which makes it a little easier to check that the changes are the same in each table. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07y2038: syscalls: rename y2038 compat syscallsGravatar Arnd Bergmann 1-22/+22
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-03sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEWGravatar Deepa Dinamani 1-2/+9
Add new socket timeout options that are y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: davem@davemloft.net Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixesGravatar Deepa Dinamani 1-2/+4
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMPING_NEWGravatar Deepa Dinamani 1-2/+3
Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: rth@twiddle.net Cc: tglx@linutronix.de Cc: ubraun@linux.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: Add SO_TIMESTAMP[NS]_NEWGravatar Deepa Dinamani 1-2/+13
Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of socket timestamp options. These are the y2038 safe versions of the SO_TIMESTAMP_OLD and SO_TIMESTAMPNS_OLD for all architectures. Note that the format of scm_timestamping.ts[0] is not changed in this patch. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLDGravatar Deepa Dinamani 1-7/+16
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes. Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: deller@gmx.de Cc: dhowells@redhat.com Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-afs@lists.infradead.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25arch: add split IPC system calls where neededGravatar Arnd Bergmann 1-0/+1
The IPC system call handling is highly inconsistent across architectures, some use sys_ipc, some use separate calls, and some use both. We also have some architectures that require passing IPC_64 in the flags, and others that set it implicitly. For the addition of a y2038 safe semtimedop() system call, I chose to only support the separate entry points, but that requires first supporting the regular ones with their own syscall numbers. The IPC_64 is now implied by the new semctl/shmctl/msgctl system calls even on the architectures that require passing it with the ipc() multiplexer. I'm not adding the new semtimedop() or semop() on 32-bit architectures, those will get implemented using the new semtimedop_time64() version that gets added along with the other time64 calls. Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-01-17net: introduce SO_BINDTOIFINDEX sockoptGravatar David Herrmann 1-0/+2
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-25Merge tag 'arm64-upstream' of ↵Gravatar Linus Torvalds 1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 festive updates from Will Deacon: "In the end, we ended up with quite a lot more than I expected: - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and kernel-side support to come later) - Support for per-thread stack canaries, pending an update to GCC that is currently undergoing review - Support for kexec_file_load(), which permits secure boot of a kexec payload but also happens to improve the performance of kexec dramatically because we can avoid the sucky purgatory code from userspace. Kdump will come later (requires updates to libfdt). - Optimisation of our dynamic CPU feature framework, so that all detected features are enabled via a single stop_machine() invocation - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that they can benefit from global TLB entries when KASLR is not in use - 52-bit virtual addressing for userspace (kernel remains 48-bit) - Patch in LSE atomics for per-cpu atomic operations - Custom preempt.h implementation to avoid unconditional calls to preempt_schedule() from preempt_enable() - Support for the new 'SB' Speculation Barrier instruction - Vectorised implementation of XOR checksumming and CRC32 optimisations - Workaround for Cortex-A76 erratum #1165522 - Improved compatibility with Clang/LLD - Support for TX2 system PMUS for profiling the L3 cache and DMC - Reflect read-only permissions in the linear map by default - Ensure MMIO reads are ordered with subsequent calls to Xdelay() - Initial support for memory hotplug - Tweak the threshold when we invalidate the TLB by-ASID, so that mremap() performance is improved for ranges spanning multiple PMDs. - Minor refactoring and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits) arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset() arm64: sysreg: Use _BITUL() when defining register bits arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4 arm64: docs: document pointer authentication arm64: ptr auth: Move per-thread keys from thread_info to thread_struct arm64: enable pointer authentication arm64: add prctl control for resetting ptrauth keys arm64: perf: strip PAC when unwinding userspace arm64: expose user PAC bit positions via ptrace arm64: add basic pointer authentication support arm64/cpufeature: detect pointer authentication arm64: Don't trap host pointer auth use to EL2 arm64/kvm: hide ptrauth from guests arm64/kvm: consistently handle host HCR_EL2 flags arm64: add pointer authentication register bits arm64: add comments about EC exception levels arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field arm64: enable per-task stack canaries ...
2018-12-17bpf: promote bpf_perf_event.h to mandatory UAPI headerGravatar Masahiro Yamada 1-0/+1
Since commit c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type"), all architectures (except um) are required to have bpf_perf_event.h in uapi/asm. Add it to mandatory-y so "make headers_install" can check it. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-06asm-generic: unistd.h: fixup broken macro include.Gravatar Guo Ren 1-0/+4
The broken macros make the glibc compile error. If there is no __NR3264_fstat*, we should also removed related definitions. Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Fixes: bf4b6a7d371e ("y2038: Remove stat64 family from default syscall set") [arnd: Both Marcin and Guo provided this patch to fix up my clearly broken commit, I applied the version with the better changelog.] Signed-off-by: Guo Ren <ren_guo@c-sky.com> Signed-off-by: Mao Han <han_mao@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06asm-generic: add kexec_file_load system call to unistd.hGravatar AKASHI Takahiro 1-1/+3
The initial user of this system call number is arm64. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-29Merge tag 'tty-4.20-rc1' of ↵Gravatar Linus Torvalds 1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the big tty and serial pull request for 4.20-rc1 Lots of little things here, including a merge from the SPI tree in order to keep things simpler for everyone to sync around for one platform. Major stuff is: - tty buffer clearing after use - atmel_serial fixes and additions - xilinx uart driver updates and of course, lots of tiny fixes and additions to individual serial drivers. All of these have been in linux-next with no reported issues for a while" * tag 'tty-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits) of: base: Change logic in of_alias_get_alias_list() of: base: Fix english spelling in of_alias_get_alias_list() serial: sh-sci: do not warn if DMA transfers are not supported serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES tty: check name length in tty_find_polling_driver() serial: sh-sci: Add r8a77990 support tty: wipe buffer if not echoing data tty: wipe buffer. serial: fsl_lpuart: Remove the alias node dependence TTY: sn_console: Replace spin_is_locked() with spin_trylock() Revert "serial:serial_core: Allow use of CTS for PPS line discipline" serial: 8250_uniphier: add auto-flow-control support serial: 8250_uniphier: flatten probe function serial: 8250_uniphier: remove unused "fifo-size" property dt-bindings: serial: sh-sci: Document r8a7744 bindings serial: uartps: Fix missing unlock on error in cdns_get_id() tty/serial: atmel: add ISO7816 support tty/serial_core: add ISO7816 infrastructure serial:serial_core: Allow use of CTS for PPS line discipline serial: docs: Fix filename for serial reference implementation ...
2018-10-25Merge branch 'timers-core-for-linus' of ↵Gravatar Linus Torvalds 1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
2018-10-24Merge branch 'siginfo-linus' of ↵Gravatar Linus Torvalds 1-93/+100
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
2018-10-08Merge 4.19-rc7 into tty-nextGravatar Greg Kroah-Hartman 1-0/+2
We want the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizesGravatar Anshuman Khandual 1-0/+2
ARM64 architecture also supports 32MB and 512MB HugeTLB page sizes. This just adds mmap() system call argument encoding for them. Link: http://lkml.kernel.org/r/1537841300-6979-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZEGravatar Eric W. Biederman 1-93/+94
Rework the defintion of struct siginfo so that the array padding struct siginfo to SI_MAX_SIZE can be placed in a union along side of the rest of the struct siginfo members. The result is that we no longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-03signal/sparc: Move EMT_TAGOVF into the generic siginfo.hGravatar Eric W. Biederman 1-0/+6
When moving all of the architectures specific si_codes into siginfo.h, I apparently overlooked EMT_TAGOVF. Move it now. Remove the now redundant test in siginfo_layout for SIGEMT as now NSIGEMT is always defined. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-02tty/serial_core: add ISO7816 infrastructureGravatar Nicolas Ferre 1-0/+2
Add the ISO7816 ioctl and associated accessors and data structure. Drivers can then use this common implementation to handle ISO7816 (smart cards). Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> [ludovic.desroches@microchip.com: squash and rebase, removal of gpios, checkpatch fixes] Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-29y2038: Remove stat64 family from default syscall setGravatar Arnd Bergmann 1-0/+2
New architectures should no longer need stat64, which is not y2038 safe and has been replaced by statx(). This removes the 'select __ARCH_WANT_STAT64' statement from asm-generic/unistd.h and instead moves it into the respective asm/unistd.h UAPI header files for each architecture that uses it today. In the generic file, the system call number and entry points are now made conditional, so newly added architectures (e.g. riscv32 or csky) will never need to carry backwards compatiblity for it. arm64 is the only 64-bit architecture using the asm-generic/unistd.h file, and it already sets __ARCH_WANT_NEW_STAT in its headers, and I use the same #ifdef here: future 64-bit architectures therefore won't see newstat or stat64 any more. They don't suffer from the y2038 time_t overflow, but for consistency it seems best to also let them use statx(). Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextGravatar Linus Torvalds 1-0/+3
Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...
2018-07-11asm-generic: unistd.h: Wire up sys_rseqGravatar Will Deacon 1-1/+3
The new rseq call arrived in 4.18-rc1, so provide it in the asm-generic unistd.h for architectures such as arm64. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-04net: Add a new socket option for a future transmit time.Gravatar Richard Cochran 1-0/+3
This patch introduces SO_TXTIME. User space enables this option in order to pass a desired future transmit time in a CMSG when calling sendmsg(2). The argument to this socket option is a 8-bytes long struct provided by the uapi header net_tstamp.h defined as: struct sock_txtime { clockid_t clockid; u32 flags; }; Note that new fields were added to struct sock by filling a 2-bytes hole found in the struct. For that reason, neither the struct size or number of cachelines were altered. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04Merge branch 'timers-2038-for-linus' of ↵Gravatar Linus Torvalds 3-45/+49
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time/Y2038 updates from Thomas Gleixner: - Consolidate SySV IPC UAPI headers - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism - Cleanup the core interfaces and standardize on the ktime_get_* naming convention. - Convert the X86 platform ops to timespec64 - Remove the ugly temporary timespec64 hack * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86: Convert x86_platform_ops to timespec64 timekeeping: Add more coarse clocktai/boottime interfaces timekeeping: Add ktime_get_coarse_with_offset timekeeping: Standardize on ktime_get_*() naming timekeeping: Clean up ktime_get_real_ts64 timekeeping: Remove timespec64 hack y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop y2038: ipc: Enable COMPAT_32BIT_TIME y2038: ipc: Use __kernel_timespec y2038: ipc: Report long times to user space y2038: ipc: Use ktime_get_real_seconds consistently y2038: xtensa: Extend sysvipc data structures y2038: powerpc: Extend sysvipc data structures y2038: sparc: Extend sysvipc data structures y2038: parisc: Extend sysvipc data structures y2038: mips: Extend sysvipc data structures y2038: arm64: Extend sysvipc compat data structures y2038: s390: Remove unneeded ipc uapi header files y2038: ia64: Remove unneeded ipc uapi header files y2038: alpha: Remove unneeded ipc uapi header files ...
2018-06-04Merge branch 'timers-core-for-linus' of ↵Gravatar Linus Torvalds 1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers and timekeeping updates from Thomas Gleixner: - Core infrastucture work for Y2038 to address the COMPAT interfaces: + Add a new Y2038 safe __kernel_timespec and use it in the core code + Introduce config switches which allow to control the various compat mechanisms + Use the new config switch in the posix timer code to control the 32bit compat syscall implementation. - Prevent bogus selection of CPU local clocksources which causes an endless reselection loop - Remove the extra kthread in the clocksource code which has no value and just adds another level of indirection - The usual bunch of trivial updates, cleanups and fixlets all over the place - More SPDX conversions * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) clocksource/drivers/mxs_timer: Switch to SPDX identifier clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier clocksource/drivers/timer-imx-gpt: Remove outdated file path clocksource/drivers/arc_timer: Add comments about locking while read GFRC clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages clocksource/drivers/sprd: Fix Kconfig dependency clocksource: Move inline keyword to the beginning of function declarations timer_list: Remove unused function pointer typedef timers: Adjust a kernel-doc comment tick: Prefer a lower rating device only if it's CPU local device clocksource: Remove kthread time: Change nanosleep to safe __kernel_* types time: Change types to new y2038 safe __kernel_* types time: Fix get_timespec64() for y2038 safe compat interfaces time: Add new y2038 safe __kernel_timespec posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME time: Introduce CONFIG_COMPAT_32BIT_TIME time: Introduce CONFIG_64BIT_TIME in architectures compat: Enable compat_get/put_timespec64 always ...
2018-06-04Merge branch 'siginfo-linus' of ↵Gravatar Linus Torvalds 1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "This set of changes close the known issues with setting si_code to an invalid value, and with not fully initializing struct siginfo. There remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64 and x86 to get the code that generates siginfo into a simpler and more maintainable state. Most of that work involves refactoring the signal handling code and thus careful code review. Also not included is the work to shrink the in kernel version of struct siginfo. That depends on getting the number of places that directly manipulate struct siginfo under control, as it requires the introduction of struct kernel_siginfo for the in kernel things. Overall this set of changes looks like it is making good progress, and with a little luck I will be wrapping up the siginfo work next development cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) signal/sh: Stop gcc warning about an impossible case in do_divide_error signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions signal/um: More carefully relay signals in relay_signal. signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code signal/signalfd: Add support for SIGSYS signal/signalfd: Remove __put_user from signalfd_copyinfo signal/xtensa: Use force_sig_fault where appropriate signal/xtensa: Consistenly use SIGBUS in do_unaligned_user signal/um: Use force_sig_fault where appropriate signal/sparc: Use force_sig_fault where appropriate signal/sparc: Use send_sig_fault where appropriate signal/sh: Use force_sig_fault where appropriate signal/s390: Use force_sig_fault where appropriate signal/riscv: Replace do_trap_siginfo with force_sig_fault signal/riscv: Use force_sig_fault where appropriate signal/parisc: Use force_sig_fault where appropriate signal/parisc: Use force_sig_mceerr where appropriate signal/openrisc: Use force_sig_fault where appropriate signal/nios2: Use force_sig_fault where appropriate ...
2018-05-02aio: implement io_pgeteventsGravatar Christoph Hellwig 1-1/+3
This is the io_getevents equivalent of ppoll/pselect and allows to properly mix signals and aio completions (especially with IOCB_CMD_POLL) and atomically executes the following sequence: sigset_t origmask; pthread_sigmask(SIG_SETMASK, &sigmask, &origmask); ret = io_getevents(ctx, min_nr, nr, events, timeout); pthread_sigmask(SIG_SETMASK, &origmask, NULL); Note that unlike many other signal related calls we do not pass a sigmask size, as that would get us to 7 arguments, which aren't easily supported by the syscall infrastructure. It seems a lot less painful to just add a new syscall variant in the unlikely case we're going to increase the sigset size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-25signal: Add TRAP_UNK si_code for undiagnosted trap exceptionsGravatar Eric W. Biederman 1-1/+2
Both powerpc and alpha have cases where they wronly set si_code to 0 in combination with SIGTRAP and don't mean SI_USER. About half the time this is because the architecture can not report accurately what kind of trap exception triggered the trap exception. The other half the time it looks like no one has bothered to figure out an appropriate si_code. For the cases where the architecture does not have enough information or is too lazy to figure out exactly what kind of trap exception it is define TRAP_UNK. Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-20y2038: asm-generic: Extend sysvipc data structuresGravatar Arnd Bergmann 3-45/+49
Most architectures now use the asm-generic copy of the sysvipc data structures (msqid64_ds, semid64_ds, shmid64_ds), which use 32-bit __kernel_time_t on 32-bit architectures but have padding behind them to allow extending the type to 64-bit. Unfortunately, that fails on all big-endian architectures, which have the padding on the wrong side. As so many of them get it wrong, we decided to not bother even trying to fix it up when we introduced the asm-generic copy. Instead we always use the padding word now to provide the upper 32 bits of the seconds value, regardless of the endianess. A libc implementation on a typical big-endian system can deal with this by providing its own copy of the structure definition to user space, and swapping the two 32-bit words before returning from the semctl/shmctl/msgctl system calls. Note that msqid64_ds and shmid64_ds were broken on x32 since commit f4b4aae18288 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds"). I have sent a separate fix for that, but as we no longer have to worry about x32 here, I no longer worry about x32 here and use 'unsigned long' instead of __kernel_ulong_t. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-19time: Add new y2038 safe __kernel_timespecGravatar Deepa Dinamani 1-0/+1
The new struct __kernel_timespec is similar to current internal kernel struct timespec64 on 64 bit architecture. The compat structure however is similar to below on little endian systems (padding and tv_nsec are switched for big endian systems): typedef s32 compat_long_t; typedef s64 compat_kernel_time64_t; struct compat_kernel_timespec { compat_kernel_time64_t tv_sec; compat_long_t tv_nsec; compat_long_t padding; }; This allows for both the native and compat representations to be the same and syscalls using this type as part of their ABI can have a single entry point to both. Note that the compat define is not included anywhere in the kernel explicitly to avoid confusion. These types will be used by the new syscalls that will be introduced in the consequent patches. Most of the new syscalls are just an update to the existing native ones with this new type. Hence, put this new type under an ifdef so that the architectures can define CONFIG_64BIT_TIME when they are ready to handle this switch. Cc: linux-arch@vger.kernel.org Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-12Merge branch 'parisc-4.17-2' of ↵Gravatar Linus Torvalds 1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - fix panic when halting system via "shutdown -h now" - drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF implementation - add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup for Eric Biedermans siginfo patches - move some functions to .init and some to .text.hot linker sections * 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Prevent panic at system halt parisc: Switch to generic COMPAT_BINFMT_ELF parisc: Move cache flush functions into .text.hot section parisc/signal: Add FPE_CONDTRAP for conditional trap handling
2018-04-11fs, elf: drop MAP_FIXED usage from elf_mapGravatar Michal Hocko 1-1/+3
Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before commit eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE") ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much further from the stack (eab09532d400 and later by c715b72c1ba4: "mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive stack consumption early during execve fully stopped by da029c11e6b1 ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be safe and any attack should be impractical. On the other hand this is just too subtle assumption so it can break quite easily and hard to spot. I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still fundamentally dangerous. Moreover it shouldn't be even needed. We are at the early process stage and so there shouldn't be unrelated mappings (except for stack and loader) existing so mmap for a given address should succeed even without MAP_FIXED. Something is terribly wrong if this is not the case and we should rather fail than silently corrupt the underlying mapping. Address this issue by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one without clobbering it. [mhocko@suse.com: fix build] [akpm@linux-foundation.org: coding-style fixes] [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC] Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm: introduce MAP_FIXED_NOREPLACEGravatar Michal Hocko 1-0/+1
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2. This has started as a follow up discussion [3][4] resulting in the runtime failure caused by hardening patch [5] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [6]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_NOREPLACE which enforces the given address but unlike MAP_FIXED it fails with EEXIST if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [7] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_NOREPLACE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_NOREPLACE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au This patch (of 2): MAP_FIXED is used quite often to enforce mapping at the particular range. The main problem of this flag is, however, that it is inherently dangerous because it unmaps existing mappings covered by the requested range. This can cause silent memory corruptions. Some of them even with serious security implications. While the current semantic might be really desiderable in many cases there are others which would want to enforce the given range but rather see a failure than a silent memory corruption on a clashing range. Please note that there is no guarantee that a given range is obeyed by the mmap even when it is free - e.g. arch specific code is allowed to apply an alignment. Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this behavior. It has the same semantic as MAP_FIXED wrt. the given address request with a single exception that it fails with EEXIST if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. [mpe@ellerman.id.au: fix whitespace] [fail on clashing range with EEXIST as per Florian Weimer] [set MAP_FIXED before round_hint_to_min as per Khalid Aziz] Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Russell King - ARM Linux <linux@armlinux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Evans <jasone@google.com> Cc: David Goldblatt <davidtgoldblatt@gmail.com> Cc: Edward Tomasz Napierała <trasz@FreeBSD.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11parisc/signal: Add FPE_CONDTRAP for conditional trap handlingGravatar Helge Deller 1-1/+2
Posix and common sense requires that SI_USER not be a signal specific si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2018-04-05Merge branch 'siginfo-linus' of ↵Gravatar Linus Torvalds 1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "The work on cleaning up and getting the bugs out of siginfo generation was largely stalled this round. The progress that was made was the definition of FPE_FLTUNK. Which is usable to fix many of the cases where siginfo generation is erroneously generating SI_USER by setting si_code to 0, that has recently been tagged as FPE_FIXME. You already have the change by way of the arm64 tree as that definition was pulled into the arm64 tree to allow fixing the problem there. What remains is the second round of fixing for what I thought was a trivial change to the struct siginfo when put the union in _sigfault where it belongs. Do to historical reasons 32bit m68k only ensures that pointers are 2 byte aligned. So I have added a m68k test case made of BUILD_BUG_ONs to verify I have this fix correct and possibly catch problems, and I have computed the number of bytes of padding needed for the _addr_bnd and _addr_pkey cases and just use an array of characters that size. For pure paranoia I have written the code so if there is an architecture out there that does not perform any alignment of structures it should still work. With the removal of all of the stale arechitectures this cycle future work on cleaning up struct siginfo should be much easier. Almost all of the conflicting si_code definitions have been removed with the removal of (blackfin, tile, and frv). Plus some of the most difficult to test cases have simply been removed from the tree. Which means that with a little luck copy_siginfo_to_user can become a light weight wrapper around copy_to_user in the next cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: m68k: Verify the offsets in struct siginfo never change. signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68k
2018-04-04Merge tag 'arm64-upstream' of ↵Gravatar Linus Torvalds 1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Nothing particularly stands out here, probably because people were tied up with spectre/meltdown stuff last time around. Still, the main pieces are: - Rework of our CPU features framework so that we can whitelist CPUs that don't require kpti even in a heterogeneous system - Support for the IDC/DIC architecture extensions, which allow us to elide instruction and data cache maintenance when writing out instructions - Removal of the large memory model which resulted in suboptimal codegen by the compiler and increased the use of literal pools, which could potentially be used as ROP gadgets since they are mapped as executable - Rework of forced signal delivery so that the siginfo_t is well-formed and handling of show_unhandled_signals is consolidated and made consistent between different fault types - More siginfo cleanup based on the initial patches from Eric Biederman - Workaround for Cortex-A55 erratum #1024718 - Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi - Misc cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits) arm64: uaccess: Fix omissions from usercopy whitelist arm64: fpsimd: Split cpu field out from struct fpsimd_state arm64: tlbflush: avoid writing RES0 bits arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC arm64: fpsimd: include <linux/init.h> in fpsimd.h drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor perf: arm_spe: include linux/vmalloc.h for vmap() Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)" arm64: cpufeature: Avoid warnings due to unused symbols arm64: Add work around for Arm Cortex-A55 Erratum 1024718 arm64: Delay enabling hardware DBM feature arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35 arm64: capabilities: Handle shared entries arm64: capabilities: Add support for checks based on a list of MIDRs arm64: Add helpers for checking CPU MIDR against a range arm64: capabilities: Clean up midr range helpers arm64: capabilities: Change scope of VHE to Boot CPU feature ...
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextGravatar Linus Torvalds 1-1/+4
Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations
2018-04-02Merge tag 'arch-removal' of ↵Gravatar Linus Torvalds 2-209/+10
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pul removal of obsolete architecture ports from Arnd Bergmann: "This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. [ See the new nds32 port merged in the previous commit for the next generation of "one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem" - Linus ] The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar [ Palmer Dabbelt points out that RISC-V support is in mainline gcc since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]" This really says it all: 2498 files changed, 95 insertions(+), 467668 deletions(-) * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits) MAINTAINERS: UNICORE32: Change email account staging: iio: remove iio-trig-bfin-timer driver tty: hvc: remove tile driver tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers serial: remove tile uart driver serial: remove m32r_sio driver serial: remove blackfin drivers serial: remove cris/etrax uart drivers usb: Remove Blackfin references in USB support usb: isp1362: remove blackfin arch glue usb: musb: remove blackfin port usb: host: remove tilegx platform glue pwm: remove pwm-bfin driver i2c: remove bfin-twi driver spi: remove blackfin related host drivers watchdog: remove bfin_wdt driver can: remove bfin_can driver mmc: remove bfin_sdh driver input: misc: remove blackfin rotary driver input: keyboard: remove bf54x driver ...
2018-04-02signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68kGravatar Eric W. Biederman 1-2/+5
The change moving addr_lsb into the _sigfault union failed to take into account that _sigfault._addr_bnd._lower being a pointer forced the entire union to have pointer alignment. The fix for _sigfault._addr_bnd._lower having pointer alignment failed to take into account that m68k has a pointer alignment less than the size of a pointer. So simply making the padding members pointers changed the location of later members in the structure. Fix this by directly computing the needed size of the padding members, and making the padding members char arrays of the needed size. AKA if __alignof__(void *) is 1 sizeof(short) otherwise __alignof__(void *). Which should be exactly the same rules the compiler whould have used when computing the padding. I have tested this change by adding BUILD_BUG_ONs to m68k to verify the offset of every member of struct siginfo, and with those testing that the offsets of the fields in struct siginfo is the same before I changed the generic _sigfault member and after the correction to the _sigfault member. I have also verified that the x86 with it's own BUILD_BUG_ONs to verify the offsets of the siginfo members also compiles cleanly. Cc: stable@vger.kernel.org Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 859d880cf544 ("signal: Correct the offset of si_pkey in struct siginfo") Fixes: b68a68d3dcc1 ("signal: Move addr_lsb into the _sigfault union for clarity") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-03-26asm-generic: clean up asm/unistd.hGravatar Arnd Bergmann 1-163/+0
The score architecture used a number of old system calls for compatibility with a traditional libc port, all architectures that got added later skip these. With score out of the way, we can finally clean up the syscall list to no longer provide these. Signed-off-by: Arnd Bergmann <arnd@arndb.de>