Anthropic launched two new synthetic intelligence (AI) fashions and a brand new AI functionality on Tuesday. The most important introduction is an upgraded model of Claude 3.5 Sonnet which is claimed to supply improved benchmark scores throughout totally different classes. The brand new 3.5 Sonnet additionally will get a brand new functionality dubbed Laptop Use, which is able to enable it to grasp and work together with computer systems, primarily permitting it to manage and full duties on PCs. Additional, the AI agency additionally introduced Claude 3.5 Haiku, the successor to Claude 3 Haiku.
Upgraded Claude 3.5 Sonnet With Laptop Use Launched
In a newsroom publish, Anthropic introduced an upgraded Claude 3.5 Sonnet, which gives improved efficiency in comparison with the AI mannequin launched in June. The AI agency claimed that the brand new mannequin outperforms ChatGPT-4o and Gemini 1.5 Professional in benchmarks corresponding to Graduate-Stage Google-Proof Q&A (GPQA), Large Multitask Language Understanding (MMLU) Professional, and coding-focused HumanEval.
Nonetheless, probably the most vital enhancements have been claimed in two explicit benchmarks — Software program Engineering Benchmark (SWE-bench), which elevated from 33.4 % to 49 %, and Software-Agent-Person (TAU-bench), which moved from 62.6 % to 69.2 %. Each of those benchmarks relate to AI agentic efficiency.
This AI agentic functionality is related since Anthropic launched the brand new Laptop Use functionality that enables AI fashions to manage and full duties on PCs. At the moment, this functionality is obtainable through an software programming interface (API) which solely runs on Claude 3.5 Sonnet.
With Laptop Use, Claude is studying basic pc expertise. With specialised software program, it might imitate keystrokes, button clicks, and cursor actions. Including it to the AI mannequin’s current pc imaginative and prescient functionality, Claude 3.5 Sonnet can see what’s occurring on the display, and course of the data to hold out particular duties. The characteristic will work based mostly on prompts supplied to the AI.
As an illustration, customers can ask the massive language mannequin (LLM) to ebook tickets on a web site, fill out an software, and even obtain and set up an software. Whereas specialised instruments that may automate sure PC duties exist already, a general-purpose instrument that works on natural-language prompts is a big milestone for generative AI know-how.
Nonetheless, Anthropic admits that this functionality continues to be in its nascent stage and there are specific limitations. “Some actions that folks carry out effortlessly—scrolling, dragging, zooming—at present current challenges for Claude,” the corporate highlighted. For now, it’s suggested that builders ought to use this functionality for under low-risk duties.
With automated pc management capabilities, there are issues about whether or not the AI mannequin could be engineered to carry out dangerous and unlawful actions. The corporate has not revealed any particulars in regards to the safety of the AI mannequin and the protection of customers at current. Notably, the upgraded Claude 3.5 Sonnet is obtainable for all customers and builders can construct on this functionality through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.
Claude 3.5 Haiku Introduced
One other main announcement was the revealing of Claude 3.5 Haiku. For context, Haiku is the most cost effective and quickest AI mannequin collection supplied by Anthropic. The AI agency now claims that the capabilities of the successor to the Claude 3 Haiku outperform Claude 3 Opus, the corporate’s earlier flagship-grade mannequin. This implies customers can now entry a robust AI mannequin at a less expensive value level.
Claude 3.5 Haiku will likely be launched later this month throughout varied platforms together with the corporate’s API, Amazon Bedrock, and Google Cloud’s Vertex AI. It can initially be accessible as a text-only mannequin and can later be up to date to just accept photos as enter.