aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-12rds: ib: Remove two ib_modify_qp() callsGravatar Håkon Bugge 2-34/+2
For some HCAs, ib_modify_qp() is an expensive operation running virtualized. For both the active and passive side, the QP returned by the CM has the state set to RTS, so no need for this excess RTS -> RTS transition. With IB Core's ability to set the RNR Retry timer, we use this interface to shave off another ib_modify_qp(). Fixes: ec16227e1414 ("RDS/IB: Infiniband transport") Link: https://lore.kernel.org/r/1617216194-12890-3-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12IB/cma: Introduce rdma_set_min_rnr_timer()Gravatar Håkon Bugge 3-0/+45
Introduce the ability for kernel ULPs to adjust the minimum RNR Retry timer. The INIT -> RTR transition executed by RDMA CM will be used for this adjustment. This avoids an additional ib_modify_qp() call. rdma_set_min_rnr_timer() must be called before the call to rdma_connect() on the active side and before the call to rdma_accept() on the passive side. The default value of RNR Retry timer is zero, which translates to 655 ms. When the receiver is not ready to accept a send messages, it encodes the RNR Retry timer value in the NAK. The requestor will then wait at least the specified time value before retrying the send. The 5-bit value to be supplied to the rdma_set_min_rnr_timer() is documented in IBTA Table 45: "Encoding for RNR NAK Timer Field". Link: https://lore.kernel.org/r/1617216194-12890-2-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/i40iw: Use DEFINE_SPINLOCK() for spinlockGravatar Ye Bin 1-2/+1
spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Link: https://lore.kernel.org/r/20210409095147.2294269-1-yebin10@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/bnxt_re: Fix error return code in bnxt_qplib_cq_process_terminal()Gravatar Wang Wensheng 1-0/+1
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/20210408113137.97202-1-wangwensheng4@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12IB/hfi1: Fix error return code in parse_platform_config()Gravatar Wang Wensheng 1-0/+1
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20210408113140.103032-1-wangwensheng4@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/qedr: Fix error return code in qedr_iw_connect()Gravatar Wang Wensheng 1-1/+3
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 82af6d19d8d9 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr") Link: https://lore.kernel.org/r/20210408113135.92165-1-wangwensheng4@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Correct format of block commentsGravatar Wenpeng Liang 4-4/+8
Block comments should not use a trailing */ on a separate line and every line of a block comment should start with an '*'. Link: https://lore.kernel.org/r/1617783353-48249-7-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Correct format of bracesGravatar Wenpeng Liang 5-28/+25
Do following cleanups about braces: - Add the necessary braces to maintain context alignment. - Fix the open '{' that is not on the same line as "switch". - Remove braces that are not necessary for single statement blocks. - Fix "else" that doesn't follow close brace '}'. Link: https://lore.kernel.org/r/1617783353-48249-6-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Remove redundant spacesGravatar Wenpeng Liang 6-48/+48
Space is not required after '(', before ')', before ',' and between '*' and symbol name of a definition. Link: https://lore.kernel.org/r/1617783353-48249-5-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Add necessary spacesGravatar Wenpeng Liang 3-5/+5
Space is required before '(' of switch statements and around '='. Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Remove the redundant return statementsGravatar Wenpeng Liang 2-3/+0
The return statements at the end of a void function is meaningless. Link: https://lore.kernel.org/r/1617783353-48249-3-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Print the function name by __func__ instead of an fixed stringGravatar Wenpeng Liang 3-19/+15
It's better to use __func__ than a fixed string to print a function's name. Link: https://lore.kernel.org/r/1617783353-48249-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12Merge branch 'mlx5-next' of ↵Gravatar Jason Gunthorpe 22-62/+363
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs ==================== * mlx5-next: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Prevent le32 from being implicitly converted to u32Gravatar Lang Cheng 1-8/+7
Replace BUILD_BUG_ON_ZERO() with BUILD_BUG_ON() to avoid sparse complaining "restricted __le32 degrades to integer". Link: https://lore.kernel.org/r/1617354454-47840-10-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Simplify the function config_eqc()Gravatar Yixing Liu 2-185/+65
Use "hr_reg_write" replace "roce_set_filed". Link: https://lore.kernel.org/r/1617354454-47840-9-git-send-email-liweihang@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Add XRC subtype in QPC and XRC type in SRQCGravatar Wenpeng Liang 2-7/+11
A field to distuiguish basic SRQ from XRC SRQ in SRQC and a field in QPC to determine whether a QP is XRC TGT QP or XRC INI QP are missing. Fixes: 32548870d438 ("RDMA/hns: Add support for XRC on HIP09") Link: https://lore.kernel.org/r/1617354454-47840-8-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Remove unsupported QP typesGravatar Wenpeng Liang 3-5/+1
The hns ROCEE does not support UC QP currently. Link: https://lore.kernel.org/r/1617354454-47840-7-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Delete unused members in the structure hns_roce_hwGravatar Yangyang Li 3-32/+0
Some structure members in hns_roce_hw have never been used and need to be deleted. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: b156269d88e4 ("RDMA/hns: Add modify CQ support for hip08") Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Link: https://lore.kernel.org/r/1617354454-47840-6-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Delete redundant abnormal interrupt statusGravatar Wenpeng Liang 2-16/+6
The hardware supports only two types of abnormal interrupts. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/1617354454-47840-5-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Delete redundant condition judgment related to eqGravatar Yangyang Li 1-21/+6
The register value related to the eq interrupt depends only on enable_flag, so the redundant condition judgment is deleted. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/1617354454-47840-4-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Fix missing assignment of max_inline_dataGravatar Weihang Li 1-0/+1
When querying QP, the ULPs should be informed of the max length of inline data supported by the hardware. Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Link: https://lore.kernel.org/r/1617354454-47840-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Avoid enabling RQ inline on UDGravatar Weihang Li 2-3/+8
RQ inline is not supported on UD/GSI QP, it should be disabled in QPC. Link: https://lore.kernel.org/r/1617354454-47840-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Modify prints for mailbox and command queueGravatar Wenpeng Liang 2-10/+20
Use ratelimited print in mbox and cmq. And print mailbox operation if mailbox fails because it's useful information for the user. Link: https://lore.kernel.org/r/1617262341-37571-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Support more return types of command queueGravatar Lang Cheng 1-4/+14
Add error code definition according to the return code from firmware to help find out more detailed reasons why a command fails to be sent. Link: https://lore.kernel.org/r/1617262341-37571-3-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/hns: Enable all CMDQ contextGravatar Lang Cheng 2-41/+27
Fix error of cmd's context number calculation algorithm to enable all of 32 cmd entries and support 32 concurrent accesses. Link: https://lore.kernel.org/r/1617262341-37571-2-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-08RDMA/rxe: Fix missing acks from responderGravatar Bob Pearson 2-11/+8
All responder errors from request packets that do not consume a receive WQE fail to generate acks for RC QPs. This patch corrects this behavior by making the flow follow the same path as request packets that do consume a WQE after the completion. Link: https://lore.kernel.org/r/20210402001016.3210-1-rpearson@hpe.com Link: https://lore.kernel.org/linux-rdma/1a7286ac-bcea-40fb-2267-480134dd301b@gmail.com/ Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07RDMA/core: Make the wc status prompt message clearerGravatar Yixian Liu 1-2/+2
Local invalidate is also a kind of memory management operation, not only memory bind operation. Furthermore, as invalidate operations include local and remote, add prefix to the prompt message to make it clearer. Link: https://lore.kernel.org/r/1617698772-13871-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07RDMA/hns: Use GFP_ATOMIC under spin lockGravatar Wei Yongjun 1-1/+1
A spin lock is taken here so we should use GFP_ATOMIC. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/20210407154900.3486268-1-weiyongjun1@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/mlx5: Reduce max order of memory allocated for xlt updateGravatar Praveen Kumar Kannoju 1-1/+1
To update xlt (during mlx5_ib_reg_user_mr()), the driver can request up to 1 MB (order-8) memory, depending on the size of the MR. This costly allocation can sometimes take very long to return (a few seconds). This causes the calling application to hang for a long time, especially when the system is fragmented. To avoid these long latency spikes, the calls the higher order allocations need to fail faster in case they are not available. In order to acheive this we need __GFP_NORETRY flag in the gfp_mask before during fetching the free pages. Allow the algorithm to automatically fall back to smaller page sizes. Link: https://lore.kernel.org/r/1617425635-35631-1-git-send-email-praveen.kannoju@oracle.com Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Remove unused functionGravatar Kaike Wan 1-18/+0
Remove the unused function sdma_iowait_schedule(). Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/1617026791-89997-1-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Use kzalloc() for mmu_rb_handler allocationGravatar Mike Marciniszyn 1-1/+1
The code currently assumes that the mmu_notifier struct embedded in mmu_rb_handler only contains two fields. There are now extra fields: struct mmu_notifier { struct hlist_node hlist; const struct mmu_notifier_ops *ops; struct mm_struct *mm; struct rcu_head rcu; unsigned int users; }; Given that there in no init for the mmu_notifier, a kzalloc() should be used to insure that any newly added fields are given a predictable initial value of zero. Fixes: 06e0ffa69312 ("IB/hfi1: Re-factor MMU notification code") Link: https://lore.kernel.org/r/1617026056-50483-9-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Adam Goldman <adam.goldman@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Add additional usdma tracesGravatar Mike Marciniszyn 3-3/+85
Add traces that were vital in isolating an issue with pq waitlist in commit fa8dac396863 ("IB/hfi1: Fix another case where pq is left on waitlist") Link: https://lore.kernel.org/r/1617026056-50483-8-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Remove indirect call to hfi1_ipoib_send_dma()Gravatar Mike Marciniszyn 3-16/+8
hfi1_ipoib_send() directly calls hfi1_ipoib_send_dma() with no value add. Fix by renaming hfi1_ipoib_send_dma() to hfi1_ipoib_send(). Link: https://lore.kernel.org/r/1617026056-50483-6-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Use napi_schedule_irqoff() for tx napiGravatar Mike Marciniszyn 1-1/+1
The call is from an ISR context and napi_schedule_irqoff() can be used. Change the call to the more efficient type. Link: https://lore.kernel.org/r/1617026056-50483-5-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Correct oversized ring allocationGravatar Mike Marciniszyn 2-8/+9
The completion ring for tx is using the wrong size to size the ring, oversizing the ring by two orders of magniture. Correct the allocation size and use kcalloc_node() to allocate the ring. Fix mistaken GFP defines in similar allocations. Link: https://lore.kernel.org/r/1617026056-50483-4-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdevGravatar Mike Marciniszyn 5-0/+39
The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugging especially when multiple transmit queues are being used. Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1 to print additional information. The existing non-helpful message is avoided when the driver has presented a callback. Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Add AIP tx tracesGravatar Mike Marciniszyn 2-3/+119
Add traces to allow for debugging issues with AIP tx. Link: https://lore.kernel.org/r/1617026056-50483-2-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-04net/mlx5: Implement sriov_get_vf_total_msix/count() callbacksGravatar Leon Romanovsky 3-0/+44
The mlx5 implementation executes a firmware command on the PF to change the configuration of the selected VF. Link: https://lore.kernel.org/linux-pci/20210314124256.70253-5-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04net/mlx5: Dynamically assign MSI-X vectors countGravatar Leon Romanovsky 4-2/+93
The number of MSI-X vectors is a PCI property visible through lspci. The field is read-only and configured by the device. The mlx5 devices work in a static or dynamic assignment mode. Static assignment means that all newly created VFs have a preset number of MSI-X vectors determined by device configuration parameters. This can result in some VFs having too many or too few MSI-X vectors. Till now this has been the only means of fine-tuning the MSI-X vector count and it was acceptable for small numbers of VFs. With dynamic assignment the inefficiency of having a fixed number of MSI-X vectors can be avoided with each VF having exactly the required vectors. Userspace will provide this information while provisioning the VF for use, based on the intended use. For instance if being used with a VM, the MSI-X vector count might be matched to the CPU count of the VM. For compatibility mlx5 continues to start up with MSI-X vector assignment, but the kernel can now access a larger dynamic vector pool and assign more vectors to created VFs. Link: https://lore.kernel.org/linux-pci/20210314124256.70253-4-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04net/mlx5: Add dynamic MSI-X capabilities bitsGravatar Leon Romanovsky 1-1/+10
These new fields declare the number of MSI-X vectors that is possible to allocate on the VF through PF configuration. Value must be in range defined by min_dynamic_vf_msix_table_size and max_dynamic_vf_msix_table_size. The driver should continue to query its MSI-X table through PCI configuration header. Link: https://lore.kernel.org/linux-pci/20210314124256.70253-3-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04PCI/IOV: Add sysfs MSI-X vector assignment interfaceGravatar Leon Romanovsky 5-8/+137
A typical cloud provider SR-IOV use case is to create many VFs for use by guest VMs. The VFs may not be assigned to a VM until a customer requests a VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors proportional to the number of CPUs in the VM, but there is no standard way to change the number of MSI-X vectors supported by a VF. Some Mellanox ConnectX devices support dynamic assignment of MSI-X vectors to SR-IOV VFs. This can be done by the PF driver after VFs are enabled, and it can be done without affecting VFs that are already in use. The hardware supports a limited pool of MSI-X vectors that can be assigned to the PF or to individual VFs. This is device-specific behavior that requires support in the PF driver. Add a read-only "sriov_vf_total_msix" sysfs file for the PF and a writable "sriov_vf_msix_count" file for each VF. Management software may use these to learn how many MSI-X vectors are available and to dynamically assign them to VFs before the VFs are passed through to a VM. If the PF driver implements the ->sriov_get_vf_total_msix() callback, "sriov_vf_total_msix" contains the total number of MSI-X vectors available for distribution among VFs. If no driver is bound to the VF, writing "N" to "sriov_vf_msix_count" uses the PF driver ->sriov_set_msix_vec_count() callback to assign "N" MSI-X vectors to the VF. When a VF driver subsequently reads the MSI-X Message Control register, it will see the new Table Size "N". Link: https://lore.kernel.org/linux-pci/20210314124256.70253-2-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-01RDMA/hns: Reorganize doorbell update interfaces for all queuesGravatar Yixian Liu 7-130/+119
The doorbell update interfaces are very similar for different queues, such as SQ, RQ, SRQ, CQ and EQ. So reorganize these code and also fix some inappropriate naming. Link: https://lore.kernel.org/r/1616840738-7866-3-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Support configuring doorbell mode of RQ and CQGravatar Yixian Liu 5-40/+66
HIP08 supports both normal and record doorbell mode for RQ and CQ, SQ record doorbell for userspace is also supported by the software for flushing CQE process. As now the capability of HIP08 are exposed to the user and are configurable, the support of normal doorbell should be added back. Note that, if switching to normal doorbell, the kernel will report "flush cqe is unsupported" if modify qp to error status as the flush is based on record doorbell. Link: https://lore.kernel.org/r/1616840738-7866-2-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Simplify command fields for HEM base address configurationGravatar Xi Wang 2-160/+83
Use hr_reg_write() instead of roce_set_field() to simplify codes about configuring HEM BA. Link: https://lore.kernel.org/r/1616815294-13434-6-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Reorganize process of setting HEMGravatar Xi Wang 1-33/+44
Encapsulate configuring GMV base address and other type of HEM table into two separate functions to make process of setting HEM clearer. Link: https://lore.kernel.org/r/1616815294-13434-5-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Refactor reset state checking flowGravatar Xi Wang 5-200/+220
The 'HNS_ROCE_OPC_QUERY_MB_ST' command will response the mailbox complete status and hardware busy flag, and the complete status is only valid when the busy flag is 0, so it's better to query these two fields at a time rather than separately. Link: https://lore.kernel.org/r/1616815294-13434-4-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Reorganize hns_roce_create_cq()Gravatar Yixing Liu 1-28/+58
Encapsulate two subprocesses into functions. Link: https://lore.kernel.org/r/1616815294-13434-3-git-send-email-liweihang@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/hns: Refactor hns_roce_v2_poll_one()Gravatar Weihang Li 1-179/+224
Encapsulate the process of obtaining the current QP and filling WC as functions, also merge some duplicate code. Link: https://lore.kernel.org/r/1616815294-13434-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01MAINTAINERS: remove Xavier as maintainer of HISILICON ROCE DRIVERGravatar Weihang Li 1-1/+0
Wei Hu(Xavier) has left Hisilicon and his email address is invalid now. I'd be glad to add him back with another address if he wants to continue maintain this module. Link: https://lore.kernel.org/r/1617007584-39842-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs-clt: Cap max_io_sizeGravatar Jack Wang 1-1/+3
Max io size is limited by both remote buffer size and the max fr pages per mr. Link: https://lore.kernel.org/r/20210325153308.1214057-20-gi-oh.kim@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>