AI Code Generators

May 16, 2024

One area of generative artificial intelligence that organizations are already adopting is the use of generative AI to develop software code. Companies across industries, and software developers in general, are eagerly looking to AI code generators as a means to increase productivity and reduce costs associated with developing and maintaining software.

AI code generators are a type of generative AI that specifically refers to algorithms trained on existing software code that can in turn generate new code. AI code generators have been developed that can, for example, create software code based on natural language commands, provide auto-complete suggestions to developers working on code, and provide proactive debugging while coding.

AI code generators, however, contain some of the same risks that generative AI presents in other use cases. For example, like all generative AI, AI code generators are dependent on their training data. In addition to quality issues, the algorithm’s reliance on training data can introduce security vulnerabilities in the code it generates if there are security issues present in the data used to train the code generation algorithm.

This raises the question of whether organizations that hold themselves out as developing application code in accordance with secure software development industry standards (e.g., those by NIST and PCI) are able to use AI code generators and still adhere to such standards.

To answer this question, one needs to first review the guidance offered by the standard in question. Thankfully, commonly cited development standards like the NIST Secure Software Development Framework and the PCI Secure Software Standard are designed to not be dependent on specific technologies, techniques, or mechanisms. Instead, these standards typically require organizations to adopt a risk-based approach to determine what practices are relevant, appropriate, and effective to mitigate threats to each company’s own software development practices.

Given the risk-based approach of most commonly adopted development standards, it would appear that AI code generators can be leveraged by organizations and still comply with those standards so long as organizations identify the use of AI code generators as a risk and implement security controls that address that risk as part of the organization’s overall software development lifecycle process. At the end of the day, each organization will need to be able to support its risk management decisions with documented risk assessments that show the controls satisfy the relevant control objectives to adhere to typical industry standards.

Best practices for securely developing source code with an AI code generator include many of the same practices that an organization would implement for securing source code generated by a human. This can include:

  • Code reviews – regular human-led code reviews, unit testing, and integration testing can identify potential vulnerabilities at various stages of development.
  • Security scans – Static Application Security Testing (SAST) tools can scan source code to identify potential vulnerabilities, and Dynamic Application Security Testing (DAST) tools can simulate real-world attacks to identify vulnerabilities not identified during SAST.
  • Software Composition Analysis (SCA) tools can scan applications to detect third party libraries and components and identify known vulnerabilities in such libraries and components. This can be useful in identifying vulnerabilities created by training data used by the AI code generator.
  • Continuous monitoring – regularly reviewing code and staying informed on industry updates on vulnerabilities and security patches can mitigate vulnerabilities in the generated code.

In summary, organizations that wish to use AI code generators and remain in compliance with typical industry standards for secure software development can likely do so but need to specifically account for the risks associated with AI code generators and implement controls that mitigate those risks.