[RFC,0/5] VirtIO RDMA - Patchwork (2024)

Table of Contents
Message Comments References

mbox series

Message ID 20210902130625.25277-1-weijunji@bytedance.com
Headers show
Series VirtIO RDMA | expand

Message

魏俊吉 Sept. 2, 2021, 1:06 p.m. UTC

Hi all,This RFC aims to reopen the discussion of Virtio RDMA.Now this is based on Yuval Shaia's RFC "VirtIO RDMA"which implemented a frame for Virtio RDMA and a simplecontrol path (Not sure if Yuval Shaia has any furtherplan for it).We try to extend this work and implement a simpledata-path and a completed control path. Now this canwork with SEND, RECV and REG_MR in kernel. There is asimple test module in this patch that can communicatewith ibv_rc_pingpong in rdma-core.During doing this work, we have found some problems andwould like to ask for some suggestions from community:1. Each qp need two VQ, but qemu default only support 1024 VQ. I think it is possible to multiplex the VQ, since the cmd_post_send carry the qpn in request.2. The virtio-rdma device's gid should equal to host rdma device's gid. This means that we cannot use gid cache in rdma subsystem. And theoretically the gid should also equal to the device's netdev's ip address, how can we deal with this conflict.3. How to support DMA mr? The verbs in host cannot support it. And it seems hard to ping whole guest physical memory in qemu.4. The FRMR api need to set key of MR through IB_WR_REG_MR. But it is impossible to change a key of mr using uverbs. In our implementation, we change the key of WR while post_send, but this means the MR can only work with SEND and RECV since we cannot change the key in the remote. The final solution may be to implement an urdma device based on rxe in qemu, through this we can get full control of MR.5. The GSI is not supported now. And we think it's a problem that when the host receive a GSI package, it doesn't know which device it belongs to.Any further thoughts will be greatly welcomed. And we noticed thatthere seems to be no existing work for virtio-rdma spec, we arehappy to start it from this RFC.How to test with test module:1. Set test module's SERVER_ADDR and SERVER_PORT2. Build kernel and qemu3. Build rdmacm-mux in qemu/contrib and run it in backend4. Boot kernel with qemu with following args using libvirt<interface type='bridge'> <mac address='00:16:3e:5d:aa:a8'/> <source bridge='virbr0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/></interface><qemu:commandline> <qemu:arg value='-chardev'/> <qemu:arg value='socket,path=/var/run/rdmacm-mux-rxe0-1,id=mads'/> <qemu:arg value='-device'/> <qemu:arg value='virtio-rdma-pci,disable-legacy=on,addr=2.1, ibdev=rxe0,netdev=bridge0,mad-chardev=mads'/> <qemu:arg value='-object'/> <qemu:arg value='memory-backend-ram,id=mb1,size=1G,share'/> <qemu:arg value='-numa'/> <qemu:arg value='node,memdev=mb1'/></qemu:commandline>Note that virtio-net and virtio-rdma should be in same slot'sfunction 0 and function 1.5. Run "ibv_rc_pingpong -g 1 -n 500 -s 20480" as server6. Run "insmod virtio_rdma_rc_pingping_client.ko" in guestOne note regarding the patchset.We know it's not standard to collaps patches from two repos. But inorder to display the whole work of Virtio RDMA, we still did it.Thanks.patch1: RDMA/virtio-rdma Introduce a new core cap prot (linux)patch2: RDMA/virtio-rdma: VirtIO RDMA driver (linux) The main patch of virtio-rdma driver in linux kernelpatch3: RDMA/virtio-rdma: VirtIO RDMA test module (linux) A test modulepatch4: virtio-net: Move some virtio-net-pci decl to include/hw/virtio (qemu) Patch from Yuval Shaiapatch5: hw/virtio-rdma: VirtIO rdma device (qemu) The main patch of virtio-rdma device in linux kernel

Comments

Jason Wang Sept. 3, 2021, 12:57 a.m. UTC | #1

On Thu, Sep 2, 2021 at 9:07 PM Junji Wei <weijunji@bytedance.com> wrote:>> Hi all,>> This RFC aims to reopen the discussion of Virtio RDMA.> Now this is based on Yuval Shaia's RFC "VirtIO RDMA"> which implemented a frame for Virtio RDMA and a simple> control path (Not sure if Yuval Shaia has any further> plan for it).>> We try to extend this work and implement a simple> data-path and a completed control path. Now this can> work with SEND, RECV and REG_MR in kernel. There is a> simple test module in this patch that can communicate> with ibv_rc_pingpong in rdma-core.>> During doing this work, we have found some problems and> would like to ask for some suggestions from community:I think it would be beneficial if you can post a spec patch.Thanks

魏俊吉 Sept. 3, 2021, 7:41 a.m. UTC | #2

> On Sep 3, 2021, at 8:57 AM, Jason Wang <jasowang@redhat.com> wrote:> > On Thu, Sep 2, 2021 at 9:07 PM Junji Wei <weijunji@bytedance.com> wrote:>> >> Hi all,>> >> This RFC aims to reopen the discussion of Virtio RDMA.>> Now this is based on Yuval Shaia's RFC "VirtIO RDMA">> which implemented a frame for Virtio RDMA and a simple>> control path (Not sure if Yuval Shaia has any further>> plan for it).>> >> We try to extend this work and implement a simple>> data-path and a completed control path. Now this can>> work with SEND, RECV and REG_MR in kernel. There is a>> simple test module in this patch that can communicate>> with ibv_rc_pingpong in rdma-core.>> >> During doing this work, we have found some problems and>> would like to ask for some suggestions from community:> > I think it would be beneficial if you can post a spec patch.Ok, I will do it.Thanks

Jason Gunthorpe Sept. 15, 2021, 1:43 p.m. UTC | #3

On Thu, Sep 02, 2021 at 09:06:20PM +0800, Junji Wei wrote:> Hi all,> > This RFC aims to reopen the discussion of Virtio RDMA.> Now this is based on Yuval Shaia's RFC "VirtIO RDMA"> which implemented a frame for Virtio RDMA and a simple> control path (Not sure if Yuval Shaia has any further> plan for it).> > We try to extend this work and implement a simple> data-path and a completed control path. Now this can> work with SEND, RECV and REG_MR in kernel. There is a> simple test module in this patch that can communicate> with ibv_rc_pingpong in rdma-core.> > During doing this work, we have found some problems and> would like to ask for some suggestions from community:These seem like serious problems! Shouldn't these be solved beforesending patches?> 1. Each qp need two VQ, but qemu default only support 1024 VQ.> I think it is possible to multiplex the VQ, since the> cmd_post_send carry the qpn in request.QPs and CQs need to have predictable fixed WQE sizes, I don't know howyou can reasonably expect to map them to a shared queue.> 2. The virtio-rdma device's gid should equal to host rdma> device's gid. This means that we cannot use gid cache in> rdma subsystem. And theoretically the gid should also equal> to the device's netdev's ip address, how can we deal with> this conflict.You have to follow the correct semantics, the GID flows from the guestinto the host and updates the hosts GID table, not the other wayaround. > 3. How to support DMA mr? The verbs in host cannot support it.> And it seems hard to ping whole guest physical memory in qemu.Either you have to trap the FRWR in the hypervisor and pin the memory,remap the MR, etc or you have to pin the entire guest and rely onsomething like memory windows to emulate FRWR. > 4. The FRMR api need to set key of MR through IB_WR_REG_MR.> But it is impossible to change a key of mr using uverbs.FRMR is more like memory windows in user space, you can't support itusing just regular MRs.> In our implementation, we change the key of WR while post_send,> but this means the MR can only work with SEND and RECV since we> cannot change the key in the remote.Yes, this is not a realistic solution> 5. The GSI is not supported now. And we think it's a problem that> when the host receive a GSI package, it doesn't know which> device it belongs to.Of course, GSI packets are not virtualized. You need to somehowcapture GSI messages for the entire GID that the guest is using. Wedon't have any API to do this in userspace.Jason

魏俊吉 Sept. 22, 2021, 12:08 p.m. UTC | #4

> On Sep 15, 2021, at 9:43 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:> > On Thu, Sep 02, 2021 at 09:06:20PM +0800, Junji Wei wrote:>> Hi all,>> >> This RFC aims to reopen the discussion of Virtio RDMA.>> Now this is based on Yuval Shaia's RFC "VirtIO RDMA">> which implemented a frame for Virtio RDMA and a simple>> control path (Not sure if Yuval Shaia has any further>> plan for it).>> >> We try to extend this work and implement a simple>> data-path and a completed control path. Now this can>> work with SEND, RECV and REG_MR in kernel. There is a>> simple test module in this patch that can communicate>> with ibv_rc_pingpong in rdma-core.>> >> During doing this work, we have found some problems and>> would like to ask for some suggestions from community:> > These seem like serious problems! Shouldn't these be solved before> sending patches?> >> 1. Each qp need two VQ, but qemu default only support 1024 VQ.>> I think it is possible to multiplex the VQ, since the>> cmd_post_send carry the qpn in request.> > QPs and CQs need to have predictable fixed WQE sizes, I don't know how> you can reasonably expect to map them to a shared queue.Yes, it is a bad idea to multiplex the VQ. If we need more VQ,we can extend QEMU and virtio spec.>> 2. The virtio-rdma device's gid should equal to host rdma>> device's gid. This means that we cannot use gid cache in>> rdma subsystem. And theoretically the gid should also equal>> to the device's netdev's ip address, how can we deal with>> this conflict.> > You have to follow the correct semantics, the GID flows from the guest> into the host and updates the hosts GID table, not the other way> around.Sure, this is my misunderstanding.>> 3. How to support DMA mr? The verbs in host cannot support it.>> And it seems hard to ping whole guest physical memory in qemu.> > Either you have to trap the FRWR in the hypervisor and pin the memory,> remap the MR, etc or you have to pin the entire guest and rely on> something like memory windows to emulate FRWR.We want to implement an emulated RDMA device in userspace. Sincewe can directly access guest's physical memory in QEMU, it will beeasy to support DMA mr.>> 4. The FRMR api need to set key of MR through IB_WR_REG_MR.>> But it is impossible to change a key of mr using uverbs.> > FRMR is more like memory windows in user space, you can't support it> using just regular MRs.It is hard to support this using uverbs, but it is easy to supportwith uRDMA that we can get full control of mrs.>> 5. The GSI is not supported now. And we think it's a problem that>> when the host receive a GSI package, it doesn't know which>> device it belongs to.> > Of course, GSI packets are not virtualized. You need to somehow> capture GSI messages for the entire GID that the guest is using. We> don't have any API to do this in userspace.If we implement uRDMA device in QEMU, there is no need to distinguishwhich device it belongs to, because there is only one device.Thanks.Junji

Leon Romanovsky Sept. 22, 2021, 1:06 p.m. UTC | #5

On Wed, Sep 22, 2021 at 08:08:44PM +0800, Junji Wei wrote:> > On Sep 15, 2021, at 9:43 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:<...>> >> 4. The FRMR api need to set key of MR through IB_WR_REG_MR.> >> But it is impossible to change a key of mr using uverbs.> > > > FRMR is more like memory windows in user space, you can't support it> > using just regular MRs.> > It is hard to support this using uverbs, but it is easy to support> with uRDMA that we can get full control of mrs.What is uRDMA?Thanks

魏俊吉 Sept. 22, 2021, 1:37 p.m. UTC | #6

On Wed, Sep 22, 2021 at 9:06 PM Leon Romanovsky <leon@kernel.org> wrote:>> On Wed, Sep 22, 2021 at 08:08:44PM +0800, Junji Wei wrote:> > > On Sep 15, 2021, at 9:43 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:>> <...>>> > >> 4. The FRMR api need to set key of MR through IB_WR_REG_MR.> > >> But it is impossible to change a key of mr using uverbs.> > >> > > FRMR is more like memory windows in user space, you can't support it> > > using just regular MRs.> >> > It is hard to support this using uverbs, but it is easy to support> > with uRDMA that we can get full control of mrs.>> What is uRDMA?uRDMA is a software implementation of the RoCEv2 protocol like rxe.We will implement it in QEMU with VFIO or DPDK.Thanks.Junji

Leon Romanovsky Sept. 22, 2021, 1:59 p.m. UTC | #7

On Wed, Sep 22, 2021 at 09:37:37PM +0800, 魏俊吉 wrote:> On Wed, Sep 22, 2021 at 9:06 PM Leon Romanovsky <leon@kernel.org> wrote:> >> > On Wed, Sep 22, 2021 at 08:08:44PM +0800, Junji Wei wrote:> > > > On Sep 15, 2021, at 9:43 PM, Jason Gunthorpe <jgg@nvidia.com> wrote:> >> > <...>> >> > > >> 4. The FRMR api need to set key of MR through IB_WR_REG_MR.> > > >> But it is impossible to change a key of mr using uverbs.> > > >> > > > FRMR is more like memory windows in user space, you can't support it> > > > using just regular MRs.> > >> > > It is hard to support this using uverbs, but it is easy to support> > > with uRDMA that we can get full control of mrs.> >> > What is uRDMA?> > uRDMA is a software implementation of the RoCEv2 protocol like rxe.> We will implement it in QEMU with VFIO or DPDK.ok, thanks> > Thanks.> Junji
[RFC,0/5] VirtIO RDMA - Patchwork (2024)

References

Top Articles
52 Craigslist jobs in San Francisco Bay Area
On the hunt for an apartment? Try these 9 Craigslist alternatives
Euro (EUR), aktuální kurzy měn
Form V/Legends
Breaded Mushrooms
GAY (and stinky) DOGS [scat] by Entomb
Embassy Suites Wisconsin Dells
Bill Devane Obituary
Moe Gangat Age
Gwdonate Org
Dr. med. Uta Krieg-Oehme - Lesen Sie Erfahrungsberichte und vereinbaren Sie einen Termin
Scenes from Paradise: Where to Visit Filming Locations Around the World - Paradise
Cpt 90677 Reimbursem*nt 2023
Q Management Inc
WEB.DE Apps zum mailen auf dem SmartPhone, für Ihren Browser und Computer.
Average Salary in Philippines in 2024 - Timeular
Plan Z - Nazi Shipbuilding Plans
Craigslist Missoula Atv
No Hard Feelings - Stream: Jetzt Film online anschauen
91 East Freeway Accident Today 2022
Mychart Anmed Health Login
Christina Steele And Nathaniel Hadley Novel
Hdmovie 2
SuperPay.Me Review 2023 | Legitimate and user-friendly
Exl8000 Generator Battery
Weldmotor Vehicle.com
Crossword Help - Find Missing Letters & Solve Clues
Select Truck Greensboro
Pronóstico del tiempo de 10 días para San Josecito, Provincia de San José, Costa Rica - The Weather Channel | weather.com
J&R Cycle Villa Park
Calculator Souo
Martin Village Stm 16 & Imax
Capital Hall 6 Base Layout
Everything You Need to Know About NLE Choppa
Despacito Justin Bieber Lyrics
Zero Sievert Coop
Manatee County Recorder Of Deeds
Die Filmstarts-Kritik zu The Boogeyman
Aliciabibs
Craiglist Hollywood
Lovely Nails Prices (2024) – Salon Rates
Brandon Spikes Career Earnings
Weekly Math Review Q2 7 Answer Key
Gregory (Five Nights at Freddy's)
Paul Shelesh
Fairbanks Auto Repair - University Chevron
Gas Buddy Il
Ephesians 4 Niv
Plumfund Reviews
Windy Bee Favor
Cars & Trucks near Old Forge, PA - craigslist
How To Find Reliable Health Information Online
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 6254

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.