Home MarketsAsia What is new in Depseek’s last model?

What is new in Depseek’s last model?

by SuperiorInvest

Anna Barclay | Getty Images News | Getty images

The latest experimental model of the Chinese Startup Depseek promises to increase efficiency and improve AI capacity to handle a lot of information to a cost fraction, but the questions are still on how effective and safe architecture is architecture.

Deepseek sent Silicon Valley to a frenzy when he launched his first R1 model of nowhere last year, which shows that it is possible to train large -language models (LLM) quickly, in less powerful chips, using less resources.

The company launched Deepseek-V3.2 -Exex on Monday, an experimental version of its current Deepseek-V3.1-terminus model, which is based even more on its mission of increasing efficiency in AI systems, according to a publication in the AI ​​forum that embraces the face.

“Deepseek V3.2 continues the focus on efficiency, cost reduction and open source exchange,” Adina Yakefu, leader of the Chinese community in Hugging Face, told CNBC. “The great improvement is a new feature called DSA (scarce care of Deepseek), which makes AI better in the management of long documents and conversations. It also reduces the cost of executing the AI ​​in half compared to the previous version.”

“It is significant because it should make the model faster and faster to use without a remarkable fall in performance,” said Nick Patience, vice president and practice leader for AI in the Futurum group. “This makes the AI ​​more accessible to smaller developers, researchers and companies, which can lead to a wave of new and innovative applications.”

The pros and cons of little attention

An AI model makes decisions based on your training and new information, such as a notice. Let’s say an airline wants to find the best route from A a B, while there are many options, not all are feasible. By filtering the least viable routes, it drastically reduces the amount of time, the fuel and, ultimately, the money is needed to make the trip. That is exactly the scarce attention, only factors in the data that you think is important given the task in question, unlike other models so far that have reduced all data in the model.

“So, basically, you eliminate the things you think are not important,” said Ekaterina Almasque, co -founder and managing partner of the New Venture Capital Fund Blankpage Capital.

Scarce attention is a blessing for efficiency and the ability to climb the given that less resources are needed, but a concern is that it could lead to a fall in how reliable the models are due to the lack of supervision on how and why it discounts the information.

“The reality is that they [sparse attention models] They have lost many nuances, “said Almasque, who was one of the first defenders of Dataiku and Darktrace, and an investor in Graphcore.” And then the real question is, did they have the right mechanism to exclude non -important data, or there is a mechanism that excludes really important data, and then the result will be much less relevant? “

This could be particularly problematic for the safety and inclusion of AI, the investor said, adding that it may not be “the most secure AI model” to use compared to traditional competitors or architectures.

Deepseek, however, says that the experimental model works along with its V3.1-Extermo. Despite the speculation of bubble formation, AI remains in the center of the geopolitical competition with the United States and China competing for the winning place. Yakefu said that Depseek models work “just outside the box” with Chinese manufacturing chips, such as Ascend and Cambricon, which means they can run locally in domestic hardware without any additional configuration.

Deepseek also shared the programming code and the real tools necessary to use the experimental model, he said. “This means that other people can learn from him and develop their own improvements.”

But for souls, the very nature of this means that technology may not be defensible. “The approach is not super new,” he said, noting that the industry has been “talking about scattered models since 2015” and that Deepseek cannot patent its technology due to being open source. Deepseek’s competitive advantage, therefore, must be how it decides what information to include, he added.

The company itself recognizes that V3.2-Exp is an “intermediate step towards our next-generation architecture”, according to the post of the hugged face.

As patience pointed out, “this is Deepseek’s value at all times: efficiency is becoming as important as raw power.”

“Deepseek is playing the long game to keep the community invested in its progress,” Yakefu added. “People will always look for what is cheap, reliable and effective.”

Source Link

Related Posts