Researchers use AI to tackle network congestion


Gal Dalal wants to make commuting easier for those who work from home or in the office.

The NVIDIA lead researcher, part of a 10-person lab in Israel, is using AI to reduce congestion on computer networks.

For laptop jockeys, a spinning circle of death — or worse, a frozen cursor — is as bad as a sea of ​​red lights on the freeway. Like rush hour, it is caused by a flood of travelers trying to get somewhere quickly, hurrying and sometimes bumping into each other along the way.

AI at the intersection

Networks use congestion control to manage digital traffic. It is essentially a set of rules built into network adapters and switches, but as the number of users on networks increases, their conflicts can become too complex to anticipate.

AI promises to be a better traffic cop because it can see and react to patterns as they develop. That’s why Dalal is one of many researchers around the world looking for ways to make networks smarter through reinforcement learning, a type of AI that rewards models when they find good solutions.

But so far no one has come up with a practical approach for several reasons.

Race against the clock

Networks must be both fast and fair so that no request is missed. It’s a tough balancing act when no driver on the digital road can see the full, ever-changing map of other drivers and their intended destinations.

And it’s a race against time. To be effective, networks must respond to situations in about a microsecond, or one millionth of a second.

To smooth traffic, the NVIDIA team created new reinforcement learning techniques inspired by cutting-edge computer game AI and adapted them to the networking problem.

Part of their breakthrough, described in a 2021 paper, was to come up with an algorithm and corresponding reward function for a balanced network based only on the local information available for individual network streams. The algorithm allowed the team to build, train, and run an AI model on their NVIDIA DGX system.

A sensational factor

Dalal recalls the meeting where Nvidian colleague Chen Tessler showed the first graph plotting the model results on a simulated InfiniBand data center network.

“We were like, wow, ok, this is working great,” said Dalal, who wrote her doctorate. thesis on reinforcement learning at the Technion, Israel’s prestigious technical university.

“What was particularly gratifying was that we trained the model on just 32 network streams, and it generalized well what it learned to handle over 8,000 streams with all sorts of complex situations, so that the machine was doing a much better job than the predefined rules,” he added.

Reinforcement learning (purple) outperformed all rule-based congestion control algorithms in NVIDIA’s tests.

In fact, the algorithm delivered at least 1.5x higher throughput and 4x lower latency than the best rule-based technique.

Since the article was published, the work has been hailed as a real-world application that shows the potential of reinforcement learning.

AI processing in the network

The next big step, still in progress, is to design a version of the AI ​​model that can run at microsecond speeds using the network’s limited compute and memory resources. Dalal outlined two paths to follow.

His team works with the engineers who design NVIDIA BlueField DPUs to optimize AI models for future hardware. BlueField DPUs aim to perform an increasing set of communication tasks inside the network, offloading tasks from overloaded CPUs.

Elsewhere, Dalal’s team distills the essence of its AI model into a machine learning technique called retry trees, a series of yes/no decisions that are almost as smart but much simpler to execute. The team aims to present its work later this year in a form that could be immediately adopted to facilitate network traffic.

A timely traffic solution

To date, Dalal has applied reinforcement learning to everything from autonomous vehicles to data center cooling and chip design. When NVIDIA acquired Mellanox in April 2020, the NVIDIA Israel researcher began collaborating with his new colleagues from the neighboring networking group.

“It made sense to apply our AI algorithms to the work of their congestion control teams, and now, two years later, the research is more mature,” he said.

It is the right time. Recent reports of double-digit increases in car traffic in Israel since the pre-pandemic period could encourage more people to work from home, worsening network congestion.

Luckily, an AI traffic cop is on the way.


About Author

Comments are closed.