LLM toolset
IBM Collaborates With NASA for Large Language Model Toolkit to Support Scientific Research
IBM and NASA’s Interagency Implementation and Advanced Concepts Team have jointly developed a comprehensive toolset of large language models open for researchers to access curated scientific data from diverse sources.
Called INDUS, the toolkit enables research on five scientific domains, including Earth science and astrophysics. The LLM suite also offers improved data access from peer-reviewed studies about biological and physical sciences and planetary sciences.
Researchers can comprehend complicated scientific concepts and explore new research ideas more efficiently through the vast data sources that INDUS unlocks, NASA said Tuesday.
Encoders and sentence transformers comprise the toolkit’s two models. Through the LLM, the encoders convert natural language text into numeric coding. Bishwaranjan Bhattacharjee, IBM researcher, said INDUS has achieved “superior performance” with its custom vocabulary and a good encoder model training strategy.
NASA’s Goddard Earth Sciences Data and Information Services Center fine-tuned INDUS and categorized sources from the labeled data provided by domain experts.
Researchers can access the INDUS models on the Hugging Face website and watch for the NASA-IBM team’s release of benchmark datasets on some further requirements, such as Earth science extractive question answering and multi-domain information retrieval.
Category: Space