Meta AI introduces the OPT-175B large-language model


Hello and welcome to Protocol Enterprise! Today: Meta’s new big-tongue text prediction model, Western Digital eyes flash memory split and revenue boost from AMD.

Twirl up

After years of discussion, new research from Celonis has found that companies are taking sustainability improvements seriously. According to the survey results, 87% of companies are automating their supply chains to improve sustainability and 51% are willing to live with lower margins to achieve these goals.

The Problem With Meta’s AI Transparency Charm Offensive

Meta has finally revealed an algorithm – but not the one it uses to power Facebook or Instagram.

It was rumored Monday among computer scientists that Meta planned to reveal a grand new language model rivaling OpenAI’s GPT-3, the open-source technology that formed the basis of chatbots, automated customer service tools and more. “Come on, open the repo already,” wrote an ML engineer on Twitter, referencing the GitHub repository of code, data, and documentation associated with the new model.

On Tuesday, Meta unveiled the codebase, development process logbook, data, research paper, and other information associated with Open Pretrained Transformer, or OPT-175B, its new 175 billion-parameter open source large language model.

  • The company called the effort an exercise in transparency that is part of its commitment to open science.
  • Referring to GPT-3, Joelle Pineau, Managing Director of Meta AI, told Protocol: “Of course, others preceded us in terms of training large language models and in some cases provided an API to run the inference. But the code and parameters trained for these models have not been released to the wider research community.
  • “With the release of OPT-175B, we are for the first time opening this community up to direct access to large-scale models, so that the scientific discourse on LLMs can be conducted on reproducible results,” he said. she stated.

Since this morning, a Facebook search repository on GitHub was available to developers, loaded with code files and other documentation.

  • In line with emerging approaches to AI model transparency, Meta researchers have included a “model map” – a concept popularized by former Google engineer Timnit Gebru – to explain the details of the datasets used to train the model OPT-175B.
  • The Meta team used a combination of datasets, including one containing text from thousands of unpublished books and data collected over years of web crawling.
  • Pineau said no Facebook or Instagram user data was used to train the model.
  • “Meta did not use any user metadata or proprietary data to train the OPT-175B, as our goal was to be able to publicly release the models and documentation to the AI ​​research community as part of our commitment to a accessible, reproducible and transparent science,” she says.

Training large language models requires massive amounts of computation, suck up huge amounts of energy. Meta addressed the downsides of the climate impact of natural language processing AI.

  • In its OPT-175B article, the company said its model was developed with an estimated carbon footprint of 75 tons. The researchers compared this to the carbon footprint created when training other large language models, including GPT-3 (500 tons) and Gopher (380 tons).
  • “We recognize, however, that recent developments in AI research have consumed an extraordinary amount of computing power,” Pineau told Protocol. “While industry labs have begun to report on the carbon footprint of these models, most do not include the computational cost associated with the experimental phases of R&D, which in some cases can be an order of magnitude more resource-intensive than training the final model.”
  • She added: “By sharing our models, we aim to reduce the collective carbon footprint of the field as we pursue research at this scale. Otherwise, studying these patterns will require repeated efforts to reproduce, further amplifying computational costs.
  • Hardware issues may have contributed to wasted energy while training the model. In their paper, the researchers wrote, “We faced a significant number of hardware failures in our compute cluster while training OPT-175B. In total, hardware failures contributed to at least 35 manual reboots and cycled over 100 hosts in 2 months.

But there’s an elephant in the room as big as a big data-hungry language model, despite Meta’s transparency charm offensive.

  • Meta is under intense pressure to reveal details of the algorithmic systems it uses to decide which Facebook or Instagram posts get amplified or removed, which ads get kicked off the platform, or which posts get caught in the censorship nets of moderation.
  • But the OPT-175B transparency initiative doesn’t provide more insight into the AI ​​models that govern how two of the most influential social media platforms on the planet were built or operate.
  • Indeed, the OPT-175B is not used by the company in its social platforms. “Currently, the OPT-175B is only used internally as a tool for research purposes,” Pineau said.
  • “The level of transparency we’re providing with this release, including publishing our logbook and notes, really speaks to our commitment to accessible, reproducible, and transparent science,” Pineau said.

As Facebook Critics and Lawmakers Demand More Meta Transparency may not see Tuesday’s language model turning out to be a real opening, IT people had a different perspective.

  • Zoom AI scientist Awni Hannun seemed surprised by Meta’s acknowledgment of hardware failures.
  • He tweeted“Meta’s OPT 175B is a nice ‘behind the scenes’ approach to training LLMs. The instability of both hardware and training is a big challenge.

-Kate Kaye (E-mail | Twitter)


Our workplace has changed in many ways. Most work now happens inside technology, hybrid working arrangements seem to be lingering, and organizations are trying to keep up. Join us NEXT WEEK on May 10 at Guide: The Digital Adoption Summit to learn how your organization can adapt to the digital workplace.

Learn more

Storage is stronger apart?

Spinning hard drives and flash storage chips are technologies that have little to do with each other – beyond that, they both help servers store bits. Still, Western Digital manufactures flash drives and hard drives under one roof, and a letter sent to the company Tuesday by Elliott Investment Management could change that.

The activist investor asked the board to split the company into its constituent parts, a move that would effectively scale back WD’s acquisition of SanDisk to get into the flash business in the first place.

The reasoning goes like this: the promise of the SanDisk deal hasn’t borne significant fruit. There is no advantage in trying to operate two units that have very little to do with each other, in terms of technology, but also when selling both products to potential customers. Flash memory and hard drive companies would benefit from being stand-alone companies, Elliot said.

Western Digital said it would carefully review Elliott’s plan.

— Max A. Cherney (E-mail | Twitter)

Around the company

AMD’s first-quarter earnings well above Wall Street estimates, thanks to a rebound in the PC market and the gains it continues to make against Intel in the data center.

SAP has hired a bank adviser hoping to sell its Litmos learning software division for no less than $1 billion, according to Reuters.

Intel acquired Siru, a graphics chip design company it could help it build “emerging accelerated compute solutions, across blockchain, metaverse, high-performance edge computing, and hyperscale,” which, of course.


What makes managing a complex IT portfolio difficult? How can IT take the lead in software adoption? What role should cross-functional partners play in their strategy? You’ll get the answers to these questions and more from the leaders of Asana, Linksys and ELF Beauty during our CIO panel at Guide: The Digital Adoption Summit. Join us NEXT WEEK on May 10th.

Learn more

Thanks for reading – see you tomorrow!


About Author

Comments are closed.