首页AI 新闻
Stability AI launches StableCode, an LLM for code generation

Stability AI launches StableCode, an LLM for code generation

好说新闻
好说新闻
2023-08-09 01:56

Stability AI is well known for its Stable Diffusion text-to-image generation model, but that’s not all the generative AI startup is interested in developing. Stability AI is now getting into code generation too.

Today Stability AI announced the first public release of StableCode, its new open large language model (LLM) designed to help users generate programming language code. StableCode is being made available at three different levels: a base model for general use cases, an instruction model, and a long-context-window model that can support up to 16,000 tokens.

The StableCode model benefits from an initial set of programming language data from the open-source BigCode project, with additional filtering and fine-tuning from Stability AI. Initially, StableCode will support development in the Python, Go, Java, JavaScript, C, markdown and C++ programming languages.

“What we would like to do with this kind of model is to do a similar thing as we did for Stable Diffusion, which helped everyone in the world to become an artist,” Christian Laforte, head of research at Stability AI, told VentureBeat in an exclusive interview. “We’d like to do the same thing with the StableCode model: basically allow anyone that has good ideas [and] maybe has a problem, to be able to write a program that would just fix that problem.”

StableCode: Built on BigCode and big ideas

Training any LLM relies on data, and for StableCode, that data comes from the BigCode project. Using BigCode as the base for an LLM generative AI code tool is not a new idea. HuggingFace and ServiceNow launched the open StarCoder LLM back in May, which is fundamentally based on BigCode.

Nathan Cooper, lead research scientist at Stability AI, explained to VentureBeat in an exclusive interview that the training for StableCode involved significant filtering and cleaning of the BigCode data.

“We love BigCode, they do amazing work around data governance, model governance and model training,” Cooper said. “We took their datasets and we applied additional filters for quality and also for constructing the large-context-window version of the model, and then we trained it on our cluster.”

Cooper said that Stability AI also executed a number of training steps beyond what is in the core BigCode model. Those steps included successive training on specific programming languages.

“It follows a very similar approach [to what’s] done in the natural language domain, where you start off with pre-training a generalist model and then you fine-tune it on a special set of tasks, or in this case languages,” Cooper said.

StableCode’s longer token length a game changer for code generation

Looking beyond its BigCode foundation, StableCode’s long-context version could offer significant benefits to users. 

StableCode’s long-context-window version has a context window of 16,000 tokens, which Stability AI claims is larger than any other model. Cooper explained that the longer context window enables the use of more specialized and complex code generation prompts. It also means that a user can have StableCode look at a medium-sized code base that includes multiple files, to help understand and generate new code.

“You can use this longer context window to let the model know more about your code base, and what other functions are defined in other files,” Cooper said. “So that when it does suggest code, it can be more tailor-made to your code base and to your needs.”

Roping in better code generation with rotary position embedding (RoPE)

StableCode, like all modern generative AI models, is based on a transformer neural network.

Rather than using the ALiBi (Attention with Linear Biases) approach to position outputs in a transformer model — the approach used by StarCoder for its open generative AI model for coding — StableCode is using an approach known as rotary position embedding (RoPE).

Cooper said that the ALiBi approach in transformer models tends to weigh current tokens more than past tokens. In his view, that’s not an ideal approach for code, since unlike natural language, code doesn’t have a set narrative structure with a beginning, middle and end. Code functions can be defined for any point in an application flow.

“I don’t think that coding lends itself to this idea of weighing the present more important than the past, so we use … RoPE, [which] does not have this sort of bias where you’re weighing the present more than the past.”

It’s still early for StableCode, and the goal with the initial release is to see how developers will receive and use the model.

“We are going to be interfacing and working with the community to see what cool directions they come up with, and explore the generative developer space,” Cooper said.

转载自Sean Michael Kerner查看原文

全部讨论

no data来都来了,坐下聊聊