AI transformers need to be smaller and cheaper


Hello and welcome to Protocol Enterprise! Today: how researchers are trying to squeeze popular but large AI processors into smaller packages, how Wells Fargo has split its multicloud strategy, and the latest developments in enterprise technology.

Twirl up

Crime waves come and go, but banks and other financial institutions always have been and always will be a bigger target than most businesses. According to new research from VMware, 63% of financial institutions saw an increase in cyberattacks over the previous year, and 74% experienced at least one ransomware attack.

More than the eye can discern

Transformer networks, colloquially known to deep learning practitioners and computer engineers as “transformers”, are all the rage in AI. In recent years, these models, known for their massive size, large amount of data input, large scale of parameters – and, by extension, high carbon footprint and cost – have gained favor over other types of neural network architectures.

Today, chipmakers and researchers want to make them faster and more agile.

  • “It is interesting to see how quickly neural network technology is evolving. Four years ago, everyone was using these recurrent neural networks for these language models, and then the attention document was introduced, and all of a sudden everyone is using transformers,” said Bill Dally , chief scientist at Nvidia, at an AI conference last week. held by the Stanford HAI.
  • Dally was referring to an influential 2017 Google research paper showcasing an innovative architecture forming the backbone of transformer networks that relies on “attention mechanisms” or “self-attention”, a new way of processing model data inputs and outputs.
  • “The world pivoted in a few months and everything changed,” Dally said.

But some researchers go even further. It’s about not only making computationally and power-intensive processors more efficient, but also possibly improving their design so that they can process new data in peripheral devices without having to go back and forth to process data in the cloud.

  • A group of researchers from Notre Dame and China’s Zhejiang University presented a way to reduce memory processing bottlenecks and computing and power consumption requirements in an April paper.
  • The “iMTransformer” approach is a transformer accelerator, which reduces memory transfer requirements by computing in memory and reduces the number of operations required by caching reusable model parameters.
  • Right now, the trend is to make transformers bigger so that models get big enough to take on increasingly complex tasks, said Ana Franchesca Laguna, who holds a doctorate in computer science and engineering at Notre Dame.
  • As for the big natural language processing models, she said, “It’s the difference between a sentence or a paragraph and a book.” But, she added, “The bigger the transformers, the bigger your energy footprint also gets.”

Using an accelerator like the iMTransformer could help reduce this footprint, and, in the future, building models of processors that can ingest, process, and learn from new data in edge devices.

  • “Having the model closer to you would be really helpful. You could have it in your phone, for example, so it’s more accessible for peripheral devices,” Laguna said.
  • This means that IoT devices such as Amazon’s Alexa, Google Home or factory equipment maintenance sensors could process voice or other data in the device rather than having to send it to the cloud, which takes more time and more computing power, and could expose the data to potential privacy breaches, she said.
  • IBM also introduced an AI accelerator called RAPID last year.
  • “Scaling the performance of AI accelerators across generations is critical to their success in commercial deployments,” the company’s researchers wrote in a paper. “The inherently error-resistant nature of AI workloads presents a unique opportunity for improved performance/energy through precision scaling.”

Laguna uses a working-from-home analogy thinking about the benefits of data processing for AI models at the edge.

  • “[Instead of] by commuting between your home and your office, you are actually working from home. Everything is in one place, which saves a lot of energy,” she said.
  • Laguna and the other researchers she worked with tested their accelerator approach using smaller chips, then extrapolated their results to estimate how the process would work on a larger scale.
  • However, turning the small-scale project into a larger-scale reality will require larger, custom chips.

That investor interest may well be there. AI is driving increased investment in chips for specific use cases. According to data from PitchBook, global AI chip sales rose 60% last year to $35.9 billion from 2020. About half of that total came from specialized AI chips in mobile phones.

  • Systems designed to operate at the edge with less memory rather than in the cloud could facilitate AI-based applications that can respond to new information in real time, said Jarno Kartela, global head of AI consulting at the firm. of Thoughtworks Consulting.
  • “What if you could build systems that learn on their own in real time and learn by interaction?” he said. “These systems, you don’t need to run them on cloud-only environments with massive infrastructure — you can run them virtually anywhere.”

-Kate Kaye (E-mail | Twitter)


In a complex technology environment, when a business must pivot quickly in response to external forces, the “as a service” delivery model for hardware, software, and IT services offers businesses of all sizes the ultimate flexibility to stay competitive with a scalable, cloud-like consumption model and predictable payment options for hardware and service inclusions.

Learn more

Wells Fargo likes Microsoft Azure, except for the data part

As multicloud strategies continue to evolve, it becomes very interesting to understand which cloud customers choose for different workloads.

Wells Fargo plans to use Microsoft Azure for “the bulk” of the cloud portion of its hybrid cloud strategy, which it hopes will save the company $1 billion over the next ten years. years, according to a Business Insider interview with CIO Chintan Mehta published Thursday. However, it will place its “advanced workloads” – especially data and AI – on Google Cloud.

While Microsoft will get a decent windfall by tagging a big customer like Wells Fargo, data and AI workloads are some of the most profitable areas in cloud computing because they are compute-intensive. And once a business puts its critical data in a particular cloud, it’s unlikely to move that data for very long given the effort required.

Google Cloud has skated on the strength of its data and AI tools, especially BigQuery, for years as it tried to challenge AWS and Microsoft for cloud business. If a new generation of cloud converts finds that running apps on different clouds works for them, cloud providers might have some decisions to make about how and where they plan to differentiate themselves now that the basic ideas behind the cloud computing are widely accepted.

—Tom Krazit (E-mail | Twitter)


Lenovo’s extensive portfolio of end-to-end solutions gives businesses the breadth and depth of services that allow CIOs to take advantage of new information technologies to achieve their strategic objectives. Organizations also have the flexibility to evolve and invest in new technology solutions as their needs arise.

Learn more

Thanks for reading – see you tomorrow!


About Author

Comments are closed.