Natural Language Processing has a number of important uses for nonproliferation purposes. This includes categorising news articles and extracting the names of entities, organisations, and facilities from text. This project focuses on training NLP models so that they are tailored to the needs of the nonproliferation regime.
A focus of this project has been on preparing a corpse of news articles about nuclear issues to identify nuclear terms and entities mentioned in the text. This can enable the training or retraining of a machine learning model to extract nuclear-related terms. When applied in a news article analysis workflow, this could allow for the identification of nuclear facilities mentioned in the article to determine if the news article contains something new.
Going forward, this project will have two points of focus. The first will be in building out the workflows to apply these NLP models to relevant articles and exploit the results. The second will be in investigating whether Large Language Models can act as an alternative to specially trained NLP models for this use case.