aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)AuthorFilesLines
2020-08-23treewide: Use fallthrough pseudo-keywordGravatar Gustavo A. R. Silva 5-12/+12
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-20RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"Gravatar Colin Ian King 1-1/+1
There is a spelling mistake in a pr_warn message. Fix it. Link: https://lore.kernel.org/r/20200810075824.46770-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-12mm/gup: remove task_struct pointer for all gup codeGravatar Peter Xu 1-1/+1
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaGravatar Linus Torvalds 20-670/+665
Pull rdma updates from Jason Gunthorpe: "A quiet cycle after the larger 5.8 effort. Substantially cleanup and driver work with a few smaller features this time. - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re - Removal of dead or redundant code across the drivers - RAW resource tracker dumps to include a device specific data blob for device objects to aide device debugging - Further advance the IOCTL interface, remove the ability to turn it off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands - Remove stubs related to devices with no pkey table - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a device to give higher performance - Several more static checker, syzkaller and rare crashers fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits) RDMA/mlx5: Fix flow destination setting for RDMA TX flow table RDMA/rxe: Remove pkey table RDMA/umem: Add a schedule point in ib_umem_get() RDMA/hns: Fix the unneeded process when getting a general type of CQE error RDMA/hns: Fix error during modify qp RTS2RTS RDMA/hns: Delete unnecessary memset when allocating VF resource RDMA/hns: Remove redundant parameters in set_rc_wqe() RDMA/hns: Remove support for HIP08_A RDMA/hns: Refactor hns_roce_v2_set_hem() RDMA/hns: Remove redundant hardware opcode definitions RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP RDMA/include: Replace license text with SPDX tags RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting RDMA/cma: Execute rdma_cm destruction from a handler properly RDMA/cma: Remove unneeded locking for req paths RDMA/cma: Using the standard locking pattern when delivering the removal event RDMA/cma: Simplify DEVICE_REMOVAL for internal_id RDMA/efa: Add EFA 0xefa1 PCI ID RDMA/efa: User/kernel compatibility handshake mechanism ...
2020-08-04Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mappingGravatar Linus Torvalds 1-1/+5
Pull dma-mapping updates from Christoph Hellwig: - make support for dma_ops optional - move more code out of line - add generic support for a dma_ops bypass mode - misc cleanups * tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping: dma-contiguous: cleanup dma_alloc_contiguous dma-debug: use named initializers for dir2name powerpc: use the generic dma_ops_bypass mode dma-mapping: add a dma_ops_bypass flag to struct device dma-mapping: make support for dma ops optional dma-mapping: inline the fast path dma-direct calls dma-mapping: move the remaining DMA API calls out of line
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Gravatar Linus Torvalds 1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-31RDMA/umem: Add a schedule point in ib_umem_get()Gravatar Eric Dumazet 1-0/+1
Mapping as little as 64GB can take more than 10 seconds, triggering issues on kernels with CONFIG_PREEMPT_NONE=y. ib_umem_get() already splits the work in 2MB units on x86_64, adding a cond_resched() in the long-lasting loop is enough to solve the issue. Note that sg_alloc_table() can still use more than 100 ms, which is also problematic. This might be addressed later in ib_umem_add_sg_table(), adding new blocks in sgl on demand. Link: https://lore.kernel.org/r/20200730015755.1827498-1-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30RDMA/core: Free DIM memory in error unwindGravatar Leon Romanovsky 1-0/+1
The memory allocated for the DIM wasn't freed in in error unwind path, fix it by calling to rdma_dim_destroy(). Fixes: da6629793aa6 ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com <mailto:maxg@mellanox.com>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30RDMA/core: Stop DIM before destroying CQGravatar Leon Romanovsky 1-3/+10
HW destroy operation should be last operation after all possible CQ users completed their work, so move DIM work cancellation before such destroy call. Fixes: da6629793aa6 ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-3-leon@kernel.org Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QPGravatar Mark Zhang 1-3/+0
When dumping QPs bound to a counter, raw QPs should be allowed to dump without the CAP_NET_RAW privilege. This is consistent with what "rdma res show qp" does. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20200727095828.496195-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Execute rdma_cm destruction from a handler properlyGravatar Jason Gunthorpe 1-90/+84
When a rdma_cm_id needs to be destroyed after a handler callback fails, part of the destruction pattern is open coded into each call site. Unfortunately the blind assignment to state discards important information needed to do cma_cancel_operation(). This results in active operations being left running after rdma_destroy_id() completes, and the use-after-free bugs from KASAN. Consolidate this entire pattern into destroy_id_handler_unlock() and manage the locking correctly. The state should be set to RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher handlers are called. Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Remove unneeded locking for req pathsGravatar Jason Gunthorpe 1-25/+6
The REQ flows are concerned that once the handler is called on the new cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at any time. However, this is not true, while the ULP can call rdma_destroy_id(), it immediately blocks on the handler_mutex which prevents anything harmful from running concurrently. Remove the confusing extra locking and refcounts and make the handler_mutex protecting state during destroy more clear. Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Using the standard locking pattern when delivering the removal eventGravatar Jason Gunthorpe 1-26/+36
Whenever an event is delivered to the handler it should be done under the handler_mutex and upon any non-zero return from the handler it should trigger destruction of the cm_id. cma_process_remove() skips some steps here, it is not necessarily wrong since the state change should prevent any races, but it is confusing and unnecessary. Follow the standard pattern here, with the slight twist that the transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation(). Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Simplify DEVICE_REMOVAL for internal_idGravatar Jason Gunthorpe 1-1/+5
cma_process_remove() triggers an unconditional rdma_destroy_id() for internal_id's and skips the event deliver and transition through RDMA_CM_DEVICE_REMOVAL. This is confusing and unnecessary. internal_id always has cma_listen_handler() as the handler, have it catch the RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal. This way the FSM sequence never skips the DEVICE_REMOVAL case and the logic in this hard to test area is simplified. Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/core: Fix return error value in _ib_modify_qp() to negativeGravatar Li Heng 1-1/+1
The error codes in _ib_modify_qp() are supposed to be negative errno. Fixes: 7a5c938b9ed0 ("IB/core: Check for rdma_protocol_ib only after validating port_num") Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/cm: Add min length checks to user structure copiesGravatar Jason Gunthorpe 1-0/+4
These are missing throughout ucma, it harmlessly copies garbage from userspace, but in this new code which uses min to compute the copy length it can result in uninitialized stack memory. Check for minimum length at the very start. BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1df/0x240 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764 do_loop_readv_writev fs/read_write.c:737 [inline] do_iter_write+0x710/0xdc0 fs/read_write.c:1020 vfs_writev fs/read_write.c:1091 [inline] do_writev+0x42d/0x8f0 fs/read_write.c:1134 __do_sys_writev fs/read_write.c:1207 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204 do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 34e2ab57a911 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters") Fixes: 0cb15372a615 ("RDMA/cma: Connect ECE to rdma_accept") Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/uverbs: Silence shiftTooManyBitsSigned warningGravatar Leon Romanovsky 1-1/+1
Fix reported by kbuild warning. drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned] BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31)); ^ Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/uverbs: Remove redundant assignmentsGravatar Leon Romanovsky 1-5/+5
The kbuild reported the following warning, so clean whole uverbs_cmd.c file. drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment] ret = uverbs_request(attrs, &cmd, sizeof(cmd)); ^ drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used. int ret = -EINVAL; ^ Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flowGravatar Maor Gottlieb 2-1/+4
According to the locking scheme, mlx5_ib_update_xlt() should be called with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the below WARN from lockdep_assert_held(): WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib] RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00 RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940 RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100 RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001 R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000 FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib] pagefault_mr+0x315/0x440 [mlx5_ib] mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib] process_one_work+0x215/0x5c0 worker_thread+0x3c/0x380 ? process_one_work+0x5c0/0x5c0 kthread+0x133/0x150 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 Hold the SRCU during prefetch, even though it strictly isn't needed since prefetch is holding the num_deferred_work it does make it easier to reason about. Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/core: Update write interface to use automatic object lifetimeGravatar Leon Romanovsky 1-207/+86
The automatic object lifetime model allows us to change the write() interface to have the same logic as the ioctl() path. Update the create/alloc functions to be in the following format, so the code flow will be the same: * Allocate objects * Initialize them * Call to the drivers, this is last step that is allowed to fail * Finalize object * Return response and allow to core code to handle abort/commit respectively. Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/core: Align abort/commit object scheme for write() and ioctl() pathsGravatar Leon Romanovsky 2-1/+10
Create the same logic flow for the write() interface as we have for the ioctl() path by making sure that the object is committed or aborted automatically after HW object creation. Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Remove query_pkey from the mandatory opsGravatar Kamal Heib 1-1/+3
The query_pkey() isn't mandatory for the iwarp providers, so remove this requirement from the RDMA core. Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is setGravatar Kamal Heib 1-16/+29
Allocate the pkey cache only if the pkey_tbl_len is set by the provider, also add checks to avoid accessing the pkey cache when it not initialized. Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Expose pkeys sysfs files only if pkey_tbl_len is setGravatar Kamal Heib 1-20/+41
Expose the pkeys sysfs files only if the pkey_tbl_len is set by the providers. Link: https://lore.kernel.org/r/20200714183414.61069-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-19dma-mapping: make support for dma ops optionalGravatar Christoph Hellwig 1-1/+5
Avoid the overhead of the dma ops support for tiny builds that only use the direct mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-16treewide: Remove uninitialized_var() usageGravatar Kees Cook 1-2/+2
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16RDMA/cm: Protect access to remote_sidr_tableGravatar Maor Gottlieb 1-0/+2
cm.lock must be held while accessing remote_sidr_table. This fixes the below NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 2 PID: 7288 Comm: udaddy Not tainted 5.7.0_for_upstream_perf_2020_06_09_15_14_20_38 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:rb_erase+0x10d/0x360 Code: 00 00 00 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 af 48 8b 7a 10 48 89 c1 48 83 c9 01 48 89 78 08 48 89 42 10 <48> 89 0f 48 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 b1 00 00 00 RSP: 0018:ffffc90000f77c30 EFLAGS: 00010086 RAX: ffff8883df27d458 RBX: ffff8883df27da58 RCX: ffff8883df27d459 RDX: ffff8883d183fa58 RSI: ffffffffa01e8d00 RDI: 0000000000000000 RBP: ffff8883d62ac800 R08: 0000000000000000 R09: 00000000000000ce R10: 000000000000000a R11: 0000000000000000 R12: ffff8883df27da00 R13: ffffc90000f77c98 R14: 0000000000000130 R15: 0000000000000000 FS: 00007f009f877740(0000) GS:ffff8883f1a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000003d467e003 CR4: 0000000000160ee0 Call Trace: cm_send_sidr_rep_locked+0x15a/0x1a0 [ib_cm] ib_send_cm_sidr_rep+0x2b/0x50 [ib_cm] cma_send_sidr_rep+0x8b/0xe0 [rdma_cm] __rdma_accept+0x21d/0x2b0 [rdma_cm] ? ucma_get_ctx+0x2b/0xe0 [rdma_ucm] ? _copy_from_user+0x30/0x60 ucma_accept+0x13e/0x1e0 [rdma_ucm] ucma_write+0xb4/0x130 [rdma_ucm] vfs_write+0xad/0x1a0 ksys_write+0x9d/0xb0 do_syscall_64+0x48/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f009ef60924 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 2a ef 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 RSP: 002b:00007fff843edf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055743042e1d0 RCX: 00007f009ef60924 RDX: 0000000000000130 RSI: 00007fff843edf40 RDI: 0000000000000003 RBP: 00007fff843ee0e0 R08: 0000000000000000 R09: 0000557430433090 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff843edf40 R14: 000000000000038c R15: 00000000ffffff00 CR2: 0000000000000000 Fixes: 6a8824a74bc9 ("RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock") Link: https://lore.kernel.org/r/20200716105519.1424266-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16RDMA/core: Fix race in rdma_alloc_commit_uobject()Gravatar Leon Romanovsky 1-3/+3
The FD should not be installed until all of the setup is completed as the fd_install() transfers ownership of the kref to the FD table. A thread can race a close() and trigger concurrent rdma_alloc_commit_uobject() and uverbs_uobject_fd_release() which, at least, triggers a safety WARN_ON: WARNING: CPU: 4 PID: 6913 at drivers/infiniband/core/rdma_core.c:768 uverbs_uobject_fd_release+0x202/0x230 Kernel panic - not syncing: panic_on_warn set ... CPU: 4 PID: 6913 Comm: syz-executor.3 Not tainted 5.7.0-rc2 #22 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [..] RIP: 0010:uverbs_uobject_fd_release+0x202/0x230 Code: fe 4c 89 e7 e8 af 23 fe ff e9 2a ff ff ff e8 c5 fa 61 fe be 03 00 00 00 4c 89 e7 e8 68 eb f5 fe e9 13 ff ff ff e8 ae fa 61 fe <0f> 0b eb ac e8 e5 aa 3c fe e8 50 2b 86 fe e9 6a fe ff ff e8 46 2b RSP: 0018:ffffc90008117d88 EFLAGS: 00010293 RAX: ffff88810e146580 RBX: 1ffff92001022fb1 RCX: ffffffff82d5b902 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811951b040 RBP: ffff88811951b000 R08: ffffed10232a3609 R09: ffffed10232a3609 R10: ffff88811951b043 R11: 0000000000000001 R12: ffff888100a7c600 R13: ffff888100a7c650 R14: ffffc90008117da8 R15: ffffffff82d5b700 ? __uverbs_cleanup_ufile+0x270/0x270 ? uverbs_uobject_fd_release+0x202/0x230 ? uverbs_uobject_fd_release+0x202/0x230 ? __uverbs_cleanup_ufile+0x270/0x270 ? locks_remove_file+0x282/0x3d0 ? security_file_free+0xaa/0xd0 __fput+0x2be/0x770 task_work_run+0x10e/0x1b0 exit_to_usermode_loop+0x145/0x170 do_syscall_64+0x2d0/0x390 ? prepare_exit_to_usermode+0x17a/0x230 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x414da7 Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24 RSP: 002b:00007fff39d379d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000414da7 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 00007fff39d37a3c R08: 0000000400000000 R09: 0000000400000000 R10: 00007fff39d37910 R11: 0000000000000293 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003 Reorder so that fd_install() is the last thing done in rdma_alloc_commit_uobject(). Fixes: aba94548c9e4 ("IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit") Link: https://lore.kernel.org/r/20200716102059.1420681-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10RDMA/counter: Allow manually bind QPs with different pids to same counterGravatar Mark Zhang 1-1/+1
In manual mode allow bind user QPs with different pids to same counter, since this is allowed in auto mode. Bind kernel QPs and user QPs to the same counter are not allowed. Fixes: 1bd8e0a9d0fd ("RDMA/counter: Allow manual mode configuration support") Link: https://lore.kernel.org/r/20200702082933.424537-4-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10RDMA/counter: Only bind user QPs in auto modeGravatar Mark Zhang 1-1/+1
In auto mode only bind user QPs to a dynamic counter, since this feature is mainly used for system statistic and diagnostic purpose, while there's no need to counter kernel QPs so far. Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Link: https://lore.kernel.org/r/20200702082933.424537-3-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10RDMA/counter: Add PID category support in auto modeGravatar Mark Zhang 2-17/+11
With the "PID" category QPs have same PID will be bound to same counter; If this category is not set then QPs have different PIDs will be bound to same counter. This is implemented for 2 reasons: 1. The counter is a limited resource, while there may be dozens of applications, each of which creates several types of QPs, which means it may doesn't have enough counter. 2. The system administrator needs all QPs created by all applications with same type bound to one counter. The counter name and PID is only make sense when "PID" category are configured. This category can also be used in combine with others, e.g. QP type. Link: https://lore.kernel.org/r/20200702082933.424537-2-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA: Move XRCD to be under ib_core responsibilityGravatar Leon Romanovsky 2-9/+20
Update the code to allocate and free ib_xrcd structure in the ib_core instead of inside drivers. Link: https://lore.kernel.org/r/20200630101855.368895-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Create and destroy counters in the ib_coreGravatar Leon Romanovsky 2-8/+10
Move allocation and destruction of counters under ib_core responsibility Link: https://lore.kernel.org/r/20200630101855.368895-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06IB/uverbs: Expose UAPI to query MRGravatar Yishai Hadas 1-1/+51
Expose UAPI to query MR, this will let user space application that didn't allocate the MR but has access to by owning the matching command FD to retrieve its information. Link: https://lore.kernel.org/r/20200630093916.332097-8-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/mlx5: Implement the query ucontext functionalityGravatar Yishai Hadas 1-0/+1
Implement the query ucontext functionality by returning the original ucontext data as part of an extra mlx5 attribute that holds the driver UAPI response. Link: https://lore.kernel.org/r/20200630093916.332097-6-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06IB/uverbs: Expose UAPI to query ucontextGravatar Yishai Hadas 2-1/+41
Expose UAPI to query ucontext, this will let user space application that didn't allocate the ucontext but has access to by owning the matching command FD to retrieve the ucontext information. Link: https://lore.kernel.org/r/20200630093916.332097-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06IB/uverbs: Set IOVA on IB MR in uverbs layerGravatar Yishai Hadas 1-0/+4
Set IOVA on IB MR in uverbs layer to let all drivers have it, this includes both reg/rereg MR flows. As part of this change cleaned-up this setting from the drivers that already did it by themselves in their user flows. Fixes: e6f0330106f4 ("mlx4_ib: set user mr attributes in struct ib_mr") Link: https://lore.kernel.org/r/20200630093916.332097-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06IB/uverbs: Enable CQ ioctl commands by defaultGravatar Yishai Hadas 1-3/+0
Enable CQ ioctl commands by default, this functionality is fully mature to be used over ioctl, no reason to maintain any more the EXP KCONFIG entry to enable it. Link: https://lore.kernel.org/r/20200630093916.332097-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Optimize XRC target lookupGravatar Maor Gottlieb 1-36/+21
Replace the mutex with read write semaphore and use xarray instead of linked list for XRC target QPs. This will give faster XRC target lookup. In addition, when QP is closed, don't insert it back to the xarray if the destroy command failed. Link: https://lore.kernel.org/r/20200706122716.647338-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Clean ib_alloc_xrcd() and reuse it to allocate XRC domainGravatar Maor Gottlieb 2-15/+21
ib_alloc_xrcd() already does the required initialization, so move the uverbs to call it and save code duplication, while cleaning the function argument lists of that function. Link: https://lore.kernel.org/r/20200706122716.647338-3-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA: Remove the udata parameter from alloc_mr callbackGravatar Gal Pressman 1-1/+1
Allocating an MR flow can only be initiated by kernel users, and not from userspace so a udata parameter is redundant. Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Remove ib_alloc_mr_user functionGravatar Gal Pressman 1-6/+5
Allocating an MR flow can only be initiated by kernel users, and not from userspace. As a result, the udata parameter is always being passed as NULL. Rename ib_alloc_mr_user function to ib_alloc_mr and remove the udata parameter. Link: https://lore.kernel.org/r/20200706120343.10816-3-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Check for error instead of success in alloc MR functionGravatar Gal Pressman 1-12/+13
The common kernel pattern is to check for error, not success. Flip the if statement accordingly and keep the main flow unindented. Link: https://lore.kernel.org/r/20200706120343.10816-2-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Clean up tracepoint headersGravatar Chuck Lever 1-2/+0
There's no need for core/trace.c to include rdma/ib_verbs.h twice. Link: https://lore.kernel.org/r/20200702141946.3775.51943.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02IB/sa: Resolv use-after-free in ib_nl_make_request()Gravatar Divya Indi 1-21/+17
There is a race condition where ib_nl_make_request() inserts the request data into the linked list but the timer in ib_nl_request_timeout() can see it and destroy it before ib_nl_send_msg() is done touching it. This could happen, for instance, if there is a long delay allocating memory during nlmsg_new() This causes a use-after-free in the send_mad() thread: [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core] [ <ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm] [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr] [<ffffffff810a02f9>] process_one_work+0x169/0x4a0 [<ffffffff810a0b2b>] worker_thread+0x5b/0x560 [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50 [<ffffffff810a68fb>] kthread+0xcb/0xf0 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 [<ffffffff816f25a7>] ret_from_fork+0x47/0x90 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 The ownership rule is once the request is on the list, ownership transfers to the list and the local thread can't touch it any more, just like for the normal MAD case in send_mad(). Thus, instead of adding before send and then trying to delete after on errors, move the entire thing under the spinlock so that the send and update of the lists are atomic to the conurrent threads. Lightly reoganize things so spinlock safe memory allocations are done in the final NL send path and the rest of the setup work is done before and outside the lock. Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending") Link: https://lore.kernel.org/r/1592964789-14533-1-git-send-email-divya.indi@oracle.com Signed-off-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02RDMA/core: Fix bogus WARN_ON during ib_unregister_device_queued()Gravatar Jason Gunthorpe 1-3/+8
ib_unregister_device_queued() can only be used by drivers using the new dealloc_device callback flow, and it has a safety WARN_ON to ensure drivers are using it properly. However, if unregister and register are raced there is a special destruction path that maintains the uniform error handling semantic of 'caller does ib_dealloc_device() on failure'. This requires disabling the dealloc_device callback which triggers the WARN_ON. Instead of using NULL to disable the callback use a special function pointer so the WARN_ON does not trigger. Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") Link: https://lore.kernel.org/r/0-v1-a36d512e0a99+762-syz_dealloc_driver_jgg@nvidia.com Reported-by: syzbot+4088ed905e4ae2b0e13b@syzkaller.appspotmail.com Suggested-by: Hillf Danton <hdanton@sina.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24RDMA/core: Delete not-used create RWQ table functionGravatar Leon Romanovsky 1-39/+0
The RWQ table is used for RSS uverbs and not in used for the kernel consumers, delete ib_create_rwq_ind_table() routine that is not called at all. Link: https://lore.kernel.org/r/20200624105422.1452290-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/mad: Delete RMPP_STATE_CANCELING stateGravatar Shay Drory 1-13/+4
The cancel_delayed_work can be called under lock since it doesn't sleep. This makes the RMPP_STATE_CANCELING state not needed anymore, remove it. Link: https://lore.kernel.org/r/20200621104738.54850-5-leon@kernel.org Signed-off-by: Shay Drory <shayd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/mad: Change atomics to refcount APIGravatar Shay Drory 3-14/+14
The refcount API provides better safety than atomics API. Therefore, change atomic functions to refcount functions. Link: https://lore.kernel.org/r/20200621104738.54850-4-leon@kernel.org Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24IB/mad: Issue complete whenever decrements agent refcountGravatar Shay Drory 1-7/+7
Replace calls of atomic_dec() to mad_agent_priv->refcount with calls to deref_mad_agent() in order to issue complete. Most likely the refcount is > 1 at these points, but it is difficult to prove. Performance is not important on these paths, so be obviously correct. Link: https://lore.kernel.org/r/20200621104738.54850-3-leon@kernel.org Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>