Malcom deMayo, Global Vice President – Financial Services Industry, NVIDIA

Scaling finance: AI transformations driven by NVIDIA

Malcolm deMayo, Global Vice President – Financial Services Industry, NVIDIA

RevX NYC, May 22, 2024

Return to RevX videos

Abstract

In this fireside chat from the Domino Data Lab RevX conference in NYC, Malcolm DeMayo, Global VP of Financial Services at NVIDIA, shares his perspectives on the evolving landscape of AI in financial services. He discusses NVIDIA's advancements in accelerated computing and the impact of generative AI. Learn about the latest in AI technologies, from data quality and regulatory challenges to the transformative potential of AI factories and AI maturity models. Discover practical advice for data science and IT leaders on prioritizing AI initiatives, fostering experimentation, and the importance of staying ahead in the AI race.

Transcript

This transcript was generated using automated transcription tools. While we strive for accuracy, transcription errors may occur. Please be aware of potential discrepancies.

Thomas Been: Yeah, joining us… he's just out of a cab and jumped on the stage. So thank you for doing this. Very, very busy day for you, so we appreciate your presence. How about we get started? You’re the global VP of financial services, how about you tell us a little bit more about yourself?

Malcolm deMayo: Sure, first of all, Thomas, thank you for the warm welcome and the hospitality. And it's great to be here at RevX. So what we do at NVIDIA when we think of our go-to-market, we go to market by vertical.

And so I have 15 or 16 peers that we each cover an industry. I cover financial services. And what that essentially means is we're responsible for establishing our strategy, for understanding the problems that we're going to leverage our platform to solve, and to make sure that we have the right partnerships, like the one we have with Domino Data Lab.

[transcript missing due to audio error]

Fine-tuning techniques with retrieval-augmented generation have been able to achieve much higher levels of accuracy at a lower cost.

And we keep producing at a pace unmatched in the industry — the most powerful accelerated compute platform. So if you think about around February of last year, we released Hopper 1, and that increased the performance of training by 5x above our previous generation, Ampere, and the performance of inference around 12x. And your mileage will vary depending on the use case, but pretty good improvements. And when you improve the speed up, you're lowering the cost — because you basically are using the compute for less time.

Then in November we announced H2, Hopper 2, and Hopper 2 did it again, basically a 2x increase in performance and a 2x in training and in inference. But in March of this year at our GTC AI conference, Thomas, we announced Blackwell. And Blackwell kicked it up a few more notches, another 5 to 6x increase in the performance of training, and about a 30x improvement in inference. So we are focused on both the cost of training, bringing that curve down, and flattening the inference costs. As inference scales we're seeing in a token-based monetization model, costs starting to scale up, we want to flatten that. That's one of our goals.

The first challenge is data quality. The second is accuracy. The third is, how can we improve cost. The fourth area is, how do we keep up?

Every day there's another announcement it seems, or maybe not limited to one announcement. And so, you know, we can't help you there. We think that is just going to continue to accelerate. How we can help you is we are continuing to do the research and we are continuing to build, but we stay in our swim lane and we continue to build techniques into our platform so that you can take advantage of those techniques without having to build all of this yourself. And we partner with best-in-class companies like Domino Data Lab to solve problems that are outside of our swim lane, but allow you to take full advantage of the ecosystem we've built. I would say the last area, and with this, the challenges thing could be, we could probably talk about it all day, but the last area is a discussion around regulatory. In financial services —highly regulated. And this is an area that isn't going to get solved immediately. We don't have hard, fast rules yet, so it's kinda hard to solve.

But at the end of the day, thinking about, the way we think about this is that you, as practitioners, have been doing this for a very long time. You have established governance, you have established calibration capabilities, you're going to recalibrate for generative AI over time, and you're going to have conversations with your regulator or regulators on how you're doing this and why you're doing it responsibly, because that's what you've always done. And we're going to support it from the perspective of making sure that when we train a model, we produce a model card that explains very, very granularly the data lineage, things like that, so that you can explain, we know exactly where this data came from that this model was trained on. So those are some of the challenges that we are hearing.

Thomas: What excites you in terms of what you're seeing?

Malcolm: Well, so I don't know if excites the right word, but at the end of the day, there's a lot of use cases. The way I think about it, and the way NVIDIA thinks about financial services, is by segment. We think about trading as a segment, or capital markets. We think about banking as a segment. We think about payments as its own segment, [a] unique segment. And inside each of those segments, there are use cases that are real[ly] big problems. We try to solve the big problems, we try to understand the workflow, and can we bring our platform to that workflow and inject libraries and inject our compute platform to help you solve those in ways that you haven't been able to in the past. [To] give you an example, a generic example, it's not a use case, but those generic topics I talked about earlier, high performance compute, AI, scientific compute, data processing… a thousand CPU servers, we can do the work of a thousand CPU servers with 16 GPUs.

And the difference in CapEx is $10 million versus 500,000, a little under 500,000 megawatts of power consumption versus kilowatts of power consumption. And so the opportunity to bring more compute to a problem and do it in a more efficient, power-efficient way, is real. It's available today.

And so we tend to see use cases, starting out let's do what we do today better or faster and you know what… I'll give you an example of projected credit loss, right, or expected credit loss for a large financial firm that has 100 million customers and a 60-month time horizon. That scenario, and they run this daily, they have to report to the Fed what they think their expected credit loss is. That is going to take 20 hours to run on a single day, every day, every day of the year. And we can accelerate that to under four hours with our platform. What's interesting about that, that's a $20 million savings across the year in terms of the cost to compute in cloud. So you have an opportunity to say, okay, we're just going to bank the savings, get our bonus, life is good. Or you could sit back and think about, what did we do? What compromises did we make? What attributes aren't we using? What scenarios aren't we running? What more could we do? And this is what tends to happen over time, Thomas, is that we start out accelerating the workflow the way it is today, and then we start to think about how would we do this differently now that we have more time, more resource, more ability. So I think the thing that excites us is that the opportunity to rethink how you're doing things and not just sort of pave over what you've always done, how you've always done it.

I would say that the use cases that you read about are the ones that we think are gonna be, that we see going into production this year using generative AI to generate code.

A real productivity, a real multiplier opportunity. I mean, you have thousands of software developers out there, and if they can be 10, 20, 30% more productive. That's an enormous multiplier. So that would be a good place to start. The idea of calling your financial institution and hearing an IVR, press one, four, press two, four, press three, four, or wait for the next available agent, we just have to get rid of that.

And everybody, even you as practitioners, you call your banks. We have the opportunity to do it as long as we do it carefully. Klarna produced a —if you missed it — in February they announced that in the month of January they were able to do the work of 700 agents with an AI assistant. What was interesting, I thought, wasn't so much that, it was the fact that the time to errand resolution went from 11, average time from errand resolution went from 11 minutes to two minutes.

That's huge, and that's a very basic implementation. Massive use there in trade execution and in risk management as well. And so now that they have these clusters They're using them for price and risk as well. So instead of doing your price calculations overnight on a CPU farm or doing your risk calculations overnight on a CPU farm. They're doing it in almost real time on the same clusters that they're using for alpha signal detection.

In banking we're starting to see banks invest in what we call AI factories and what you all would probably call a next generation data center. And so for example, Bank of New York Mellon has built an AI factory. They call it the AI Hub. I don't know if there's anybody here from BNY Mellon, but great job. From the day they, and by the way, we don't sell our platform directly. We don't pay anybody, and nobody at NVIDIA is paid commission. It is totally available through our partners, through the clouds and through the server vendors. I'm now starting to see it being made available by software companies that are embedding the platform into their offerings. But at the end of the day, what BNY Mellon's doing is really cool. They've built an AI hub that is available now for their business lines to experiment on. And because our platform's available everywhere, you can experiment and build there, and you can deploy in cloud, you can deploy anywhere because we're multi-cloud.

And our software platform that runs on top of the hardware is 100% microservices based, so Kubernetes containers, and highly portable.

So from a use case perspective in payments, I think this is my personal view, probably not right, but before COVID, I think transaction fraud was viewed as a cost of doing business, but it has accelerated. After COVID, with the explosion of card not present and e-commerce transactions. And it's predicted to hit 40 billion a year, if the number's right, in the next couple of years. So it's a real opportunity to leverage graphs, to leverage transformers, to leverage, you know, ensemble of AI to really start thinking about building relationship maps and finding the bad guys. With that same, in that same, like second and a half that you have to deal with, that time window of, from the moment the customer taps or swipes or clicks to the moment the transaction's authorized, you have about a 20, 30 millisecond time frame to produce a fraud score. We think we can bring a lot more to that. And so those are use cases that, examples of use cases that we think are really hard to solve, and we can solve them with our platform, together with firms that wanna be forward leaning. So we bring the AI, the platform expertise, you bring the domain expertise and we solve it together.

Thomas: Thanks, fascinating perspective. You don't know this, but actually it goes very well, it relates to some of the breakouts that we've been addressing, some of the same domains, so perfectly on point. If we go from the present to the future, what are the emerging AI technologies that you think are going to shape the landscape?

Malcolm: So we're an accelerated compute platform. And so what we're working on is a lot more of what I've already talked about. So you're going to see, we're announcing earnings at 5 o'clock today. Jensen will be on. And he's going to have a company meeting right after. And I guarantee you, the first thing he's going to say to the company is, we have to go faster. And it's funny, but it's the truth. And he means it. So we're going to continue to release new GPUs, a family of GPUs. By the way, the Jensen GPU is $59. The Hopper in a package of eight is $240,000. So there's a wide range of costs. The Jensen is what powers robots in Amazon's, as an example, powers robots in Amazon's distribution centers so that they can deliver product the next day. Believe it or not, Amazon actually predicts tomorrow's orders today and reconfigures their warehouses every night using these robots trained on NVIDIA, using our Jensen GPU and some of our robotic training software. So very cool use case outside of financial services. But I think when we think about the future, it's hard to predict.

Forecasting is always a little bit more of an art than a science. I think if you take a step back and realize that the software we use today is retrieval-based. We store stuff, we retrieve it, we reorganize it, we do all sorts of dashboard-y things with it. But generative AI is we're able to interact with it conversationally. So that changes the game. That means that virtually anyone can use it. And secondly, it's generating original content. So very much like a human would, right? So generative AI is going to change the game. I said earlier, we tend to focus on use cases that we know and have today, and we sort of pave over them with technology, but as AI literacy grows, as generative AI literacy grows, you're going to start seeing, for example, the phone you're holding — I don't even know why we call it a phone — that was an original idea, right? That wasn't a make something that's running, better. So we're going to see that impact the industry, I think, in a way that none of us could probably foresee. If you think about how you might use generative AI in the near future, you might ask your assistant, think of this as an infinite pool of assistants, right? You might ask your assistant to solve a problem and you might essentially give it a budget. You've got $1,000, here's the problem I want you to work on, and I want an answer by Monday night at 7. And I'm going to go ahead and get started, you know, it's just going to change and make us all more productive in ways I don't think we can fully define right now.

Thomas: Very interesting. One of our speakers on the same stage was saying this morning, like, I don't want to ever have to book travel again. Here's where I need to go and here's the budget. So and by the way, the partnership between NVIDIA and Domino, it's exactly in enabling these use cases, bringing the people, the processes with the technology and then together we're bringing you an art of the possible that then you can take into your Into your business.

Last question because I know we just had you for half an hour. I want to stay true to my promise. What we have in front of us, data science leaders, IT leaders who are involved in AI, what is your advice to them in terms of leading the charge, leading the transformation, making this world that you described a reality in their businesses?

Malcolm: So, it's a great question, Thomas. Number one, you have to prioritize. Our teams will engage with firms and our goal is to educate, inspire, and help you learn how to use the platform and how to leverage partners like Domino Data Lab in that process so that you can do it efficiently, so that you're getting the most out of the technology.

So number one is to prioritize. When we do that, what tends to happen is there are lots of ideas on how to use the technology. And you've kind of got to wrestle amongst yourselves — what are the best ideas, and how do you define best? It'd probably be a little bit different for each firm. But you want to definitely be able to show either productivity gains, efficiency gains, or even revenue gains. So the first thing is to prioritize. The second thing is to create a place for your people to experiment. I can't emphasize this enough.

I don't know if you're familiar with a new research group out of England called Evident AI. They've been around now since a little bit before ChatGPT, but that really put them front and center. They have a maturity model, an AI maturity model, and they look at a number of different attributes on how companies, how the largest banks in the world are maturing in their use of AI. They look at how much research papers you're publishing. They look at the patents you're filing. They look at how many data science recruitment slots you have publicly, and on and on.

And I think bringing in and growing your AI literacy has to be a top priority for everyone. And you can't do that if you don't have a place for people to experiment. So that can be in cloud, that can be on-prem, but if your data scientists and your business practitioners can't get the job done, they're going to go somewhere else. We hear from the companies that come and see us, one of the interview questions now from data scientists out of university is, what kind of compute do you have? And if the answer's not accelerated compute, you're going to have a really hard time recruiting them. So create a place where your people can experiment and build your AI literacy. It doesn't mean you have to rush things to production, but learning the technology, learning how to use the capabilities, putting in place the right safeguards from a data lineage, data governance, from an MLOps perspective. All of that, get all of that in place, get all that infrastructure built out.

And then the last thing is, and this isn't really advice, this is just kind of calling out the obvious, I think. We like to talk about credit risk, market risk, opportunity risk, counterparty risk, operational risk, cyber-risk, I don't see any headlines talking about the risk of falling behind. Well, let me tell you what that looks like.

I was in London two weeks ago and having a cup of coffee early in the morning, looking at the Financial Times and right on the front cover, it's large multinational global car auto manufacturer was announcing a 20% drop in profit. And in the article, it essentially said that they were now doubling their investment in AI and electronic vehicles. Well, it's too late. That's what falling behind looks like. Maybe you can muscle your way back. But you need to build the capabilities today. We exist to build and to help. So we're happy to engage. And I have enjoyed this discussion. Thanks a lot.

Well, Malcolm, thanks so much for joining us.

Who is Domino?

Domino Data Lab empowers the largest AI-driven enterprises to build and operate AI at scale. Domino’s Enterprise AI Platform provides an integrated experience encompassing model development, MLOps, collaboration, and governance. With Domino, global enterprises can develop better medicines, grow more productive crops, develop more competitive products, and more. Founded in 2013, Domino is backed by Sequoia Capital, Coatue Management, NVIDIA, Snowflake, and other leading investors.

Attend Weekly Demo

© 2025 Domino Data Lab, Inc. Made in San Francisco.