The Chinese artificial intelligence development company Deepseek has launched a new large -weight language model (LLM).
Deepseek uploaded its newest model, Prover V2, to the accommodation service that embraces the face on April 30. The last model, published under the permissive open source license, aims to address the verification of the mathematical test.
Prover V2 has 671 billion parameters, which makes it significantly larger than its predecessors, Prover V1 and Prover V1.5, which were launched in August 2024. The document that accompanies the first version explained that the model was trained to translate problems of mathematical competition in formal logic using the programming language Lean 4, a tool widely used to cause theorems.
Developers say that Prover V2 compresses mathematical knowledge in a format that allows it to generate and verify tests, potentially helping research and education.
Related: Here’s why Deepseek crashed your bitcoin and crypt
What does everything mean?
A model, also known in an informally and incorrect way as “weights” in the AI ​​space, is the file or collection of files that allow one to execute an AI locally without trusting external servers. Even so, it is worth noting that the avant -garde LLMs require hardware to which most people do not have access.
This is because these models tend to have a great parameter count, which results in large files that require a lot of RAM or VRM (GPU memory) and processing power to execute. The new Prover V2 model weighs approximately 650 gigabytes and is expected to work from RAM or VRM.
To take them to this size, the pesos of Prover V2 have been quantified until the accuracy of the floating point of 8 bits, which means that each parameter has approached to take half of the space of the usual 16 bits, with a little of a single digit in binary numbers. This effectively reduces the volume of the model.
Prover V1 is based on the DeepseekMath model of seven billion parameters and was adjusted in synthetic data. Synthetic data refer to the data used to train AI models that, in turn, also generated by AI models, with data generated by humans, are generally considered an increasingly scarce source of higher quality data.
According to reports, Prover V1.5 improved in the previous version by optimizing both training and execution and achieving greater precision at the reference points. Until now, the improvements introduced by Prover V2 are not clear, since no research work or other information has been published when writing.
The number of parameters in the pesos of Prover V2 suggests that it is likely to be based on the company’s previous R1 model. When it was first launched, R1 made waves in the space of AI with its performance comparable to the then latest O1 model.
Related: South Korea suspends Depseek’s downloads about user data concerns
The importance of open weights
Publicly releasing LLM weights is a controversial issue. On the one hand, it is a democratizing force that allows the public to access AI in its own terms without depending on the infrastructure of private company.
On the other hand, it means that the company cannot intervene and avoid the abuse of the model by enforcing certain limitations in dangerous consultations of users. The release of R1 in this way raised security concerns, and some described it as the “Sputnik moment” of China.
Open source defenders were rejoicing that Depseek continued where Meta went with the launch of their open source AI models, which shows that Open AI is a serious contender for Openai closed AI. The accessibility of these models also continues to improve.
Accessible language models
Now, even users without access to a supercomputer that costs more than the average home in much of the world can run LLM locally. This is mainly thanks to two AI development techniques: model distillation and quantification.
Distillation refers to training a “student” compact network to replicate the behavior of a larger “teacher” model, so it maintains most of the performance while reducing the parameters to be accessible to a less powerful hardware. The quantization is to reduce the numerical precision of the weights and activations of a model to reduce the size and increase the inference speed with only a lower loss of precision.
An example is the reduction of Prover V2 of floating points numbers of 16 to eight bits, but they are possible additional reductions to half of the bits. Both techniques have consequences for model performance, but generally leave the model largely functional.
The Deepseek R1 was distilled in versions with flame models and qwen that vary from 70 billion parameters as low as 1.5 billion parameters. The smallest of these models can even be reliable in some mobile devices.
Magazine: ‘Cernobyl’ needed people to awaken the risks of AI, Studio Ghibli Memes: ai eye
