Learning to program is not easy, and teaching a machine to do it seems like an impossible task, but it is not.
IBM’s AI research division has released a dataset of 14 million samples to develop machine learning models that can aid in programming tasks.
We are talking about the CodeNet Project, a project that does not intend to make human programmers redundant, but believes that they will be able to make developers more productive.
500 million lines of code written in 55 different programming languages have been used with 4,000 challenges posted on the AIZU and AtCoder online coding platforms. The code samples include correct and incorrect answers to the challenges. CodeNet has included a huge amount of annotations that have been added to the examples, such as textual description along with CPU time and memory limits, for example. Each code submission has various pieces of information, including language, submission date, size, execution time, acceptance, and error types.
They comment on bdtechtalks.com Perhaps the most important is this metadata that accompanies the coding samples.
CodeNet could help in:
– Develop machine learning models for programming tasks.
– Translation of programming languages, passing a code from one language to another.
– Develop machine learning models for code recommendation (improve autocompletion).
– Develop code optimization systems.
– Train machine learning systems that point out possible flaws in the source code.
What is more difficult to do is create code, although at the moment they have been limited to code classification, code similarity evaluation, and code completion.
We will closely follow the evolution of the project in github.com/IBM/Project_CodeNet