Beijing Business Daily

"you can buy 2 million tokens for 1 yuan, which is equivalent to the word volume of five Xinhua dictionaries." When hearing Liu Weiguang, senior vice president of Aliyun Intelligence Group, introduce Tongyi thousand questions about the price reduction, the industry insiders whispered, "is this aimed at byte jumping?" On the morning of May 21, Aliyun announced that the input price of Qwen-Long,API (application programming interface) of Tongyi Qianqian main force model was from 0.02 yuan / thousand tokens reduced to 0.0005 yuan / thousand tokens, a decrease of 97%. In the afternoon, Baidu came up with two major models for free.

Generally speaking, a Chinese word, English word, number and symbol count as 1 token. If the price of the whole network is compared, the big model of bean bag under byte beat will be updated. According to the price list, "one yuan can buy 1.25 million tokens of the main model of Doubao, which is equivalent to the Romance of the three Kingdoms." coupled with the recent new actions of Kimi and Zhisu in cash realization and pricing, this big model price war is no less than a "6.18".

Reduced prices and free of charge

"it cannot be said that the byte jump started the price war. Before it, the big model also adjusted the price." Fan Fan (a pseudonym), a practitioner, could not say when the big model manufacturer started a price war. But he and industry observers clearly felt the smell of gunpowder on May 21.

"break through the global floor price" and "the king of performance-to-price ratio". When Liu Weiguang introduced the details of the price reduction of Tongyi Qianqian model, several big characters appeared in the background, together with the price comparison map of the big model Qwen-Long and similar products.

According to the information provided by Aliyun, Qwen-Long is a long text enhancement model of Tongyi Qianqian, with a context length of up to 10 million. Except that the input price is reduced to 0.0005 yuan / thousand tokens, the output price of Qwen-Long has dropped by 90 percent to 0.002 yuan / thousand tokens. In contrast, the import prices of foreign GPT-4, Gemini 1.5 Pro, Claude 3 Sonnet and domestic Ernie-4.0 are 0.22,0.025, 0.022 and 0.12 yuan per thousand tokens, respectively.

A few hours later, Baidu also made a big move, announcing that ENIRE Speed and ENIRE Lite, the two main models of Wenxin big model, are all free, effective immediately. According to the official introduction, ENIRE Speed is the latest self-developed language model released by Baidu in 2024. It has excellent general ability and is suitable for fine tuning as a pedestal model to better deal with specific scene problems and has excellent reasoning performance. ENIRE Lite is a lightweight large language model developed by Baidu, which is suitable for low-power AI accelerator card reasoning. Both of these two large models support customers to ask and answer questions about the capacity of 8K-128K to the large model. In Guan Xuan's free tweet, Baidu Smart Cloud marked both input and output prices as "free" in red.

Backwards, byte-beating volcano engines updated pricing details late at night on May 20. According to the information published by the Volcano Ark model service platform, the postpayment model is based on the big language model, that is, it is billed by tokens usage (input text + output text), paid per hour and postpaid by quantity, in which the free amount of Doubao-lite-32k model is 500000 tokens, the input price of reasoning service is 0.0003 yuan / thousand tokens, and the output price is 0.0006 yuan / thousand tokens.

Only the latest technology is charged.

It is not only Aliyun, Baidu and byte jumping that attract customers with low prices. A reporter from Beijing Business Daily logged on to Tencent Cloud's official website and found that four products, including hunyuan-pro and hunyuan-standard-256k, were on sale at a discount of 6.9%. The discounts for 1000, 10, 000 and 100000 resource packs were 20%, 7.5% and 30%, respectively.

There are more secret signals. On May 6th, DeepSeek, a unit of Magic Square Quantification, released the second generation of MoE (expert model) DeepSeek-V2,API pricing of 1 yuan per million input and 2 yuan output (32K context) for nearly 1% of GPT-4-Turbo. On May 13th, the new price system was launched on the open platform of Zhisu Big Model, and the price of GLM-3 Turbo model calls for entry-level products was reduced by 80%. Then OpenAI launched GPT-4o at half the price of GPT-4 Turbo, charging $5 and $15 per million tokens for input and output, respectively.

Looking back at China, the big model of bean bag updated the price list last week, using 1 yuan to buy 1.25 million tokens, which is equivalent to three books of Romance of the three Kingdoms, which makes the price war of the big model out of the circle.

Talking about the price war of the large model, Zhang Chengyu, partner of Analysys and general manager of the Analysys Enterprise Digital Center, told Beijing Business Daily, "the development of the big model has gone beyond the traditional framework of Moore's Law, and the speed of performance iteration is very fast. Performance usually doubles every half a year or so, and this cycle is still shortening, which is unprecedented. The cost reduction of large models not only depends on the reduction of hardware costs, but also involves the optimization of algorithms and the progress of model training and deployment technology. For example, through pruning, quantization and knowledge distillation, computational complexity and resource consumption can be significantly reduced.

Wang Chao, founder of the Wenyuan think tank, also compared the development of the big model to Moore's law, saying that "the big model will definitely reduce the price, and only the latest technology can charge for it."

The more customers you have, the more you lose?

Token, the smallest unit in which a natural language model processes text. "long text is only one of the dimensions of big model competition. It is a marketing means to compete with the text volume of dictionaries and Journey to the West. The performance-to-price ratio of volume token is actually invalid, and it should be multi-modal, identification, understanding and so on. At present, China's big models are money-losing marketing, who has a large scale of users, who loses more. " Wang Chao told Beijing Business Daily.

In view of the customer scale of the large model, a reporter from the Beijing Business Daily asked Aliyun, Baidu Smart Cloud and Volcano engine respectively. As of the press release, Aliyun and Baidu Intelligent Cloud did not disclose specific data. The person in charge of the volcano engine told the reporter, "Doubao Big Model has established cooperation with leading manufacturers in the fields of mobile phone, computer, automobile, finance and consumption, but because the product has just been officially released. It is still in a very early stage, and the scale of corporate customers is still small.

"Aliyun is in a very awkward position, and its huge users make it unable to be as flexible as other manufacturers in price decisions and subsidies. Even though it has repeatedly advocated price cuts, it is actually easy to be overtaken by competitors who are good at turning around." Wang Chao told Beijing Business Daily.

Zhang Chengyu's point of view is that "price competition is dynamic, and the core customer base of the large model is not price-sensitive users. The key to competition lies in who can better balance cost and user value."

For Kimi's attempt to "reward" monetization model, industry insiders have expressed their affirmation,"This means that large model manufacturers are exploring diversified monetization methods, not only limited to traditional member subscriptions and API call charges, but also exploring through user interaction and The realization of monetization of value-added services will help accelerate the realization of the closed loop of business in the industry," Zhang Chengyu said.

Combined with the escalating price war, Wang Chao hit the nail on the head: "Large models will definitely charge the C (user) side. If you dare not charge, you either have no confidence in your own technology, or your judgment on the future will lose direction. Fighting a price war will delay the company's C-terminal charges more and more, and it will also delay the company's blood-making function. It can only rely on the company's financing to obtain customers. The charging method after the Internet is free is not suitable for large-scale model competition."

Beijing Business Daily reporter Wei Wei

