From 071ba4cc559de47160761b9500b72e8fa09d923d Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 26 Oct 2020 11:25:49 -0300 Subject: RDMA: Add rdma_connect_locked() There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com Reported-and-tested-by: Guoqing Jiang Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: Santosh Shilimkar Acked-by: Jack Wang Reviewed-by: Christoph Hellwig Reviewed-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Jason Gunthorpe --- drivers/nvme/host/rdma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/nvme') diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index aad829a2b50d..8bbc48cc45dc 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1890,10 +1890,10 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue) priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize); } - ret = rdma_connect(queue->cm_id, ¶m); + ret = rdma_connect_locked(queue->cm_id, ¶m); if (ret) { dev_err(ctrl->ctrl.device, - "rdma_connect failed (%d).\n", ret); + "rdma_connect_locked failed (%d).\n", ret); goto out_destroy_queue_ib; } -- cgit v1.2.3