In this section, we highlight the functionality of KIST-NOMAD as well as its importance.
The search GUI, search API, data and files download, and GUI data search and search
results are described in detail with examples.
3.1 The data search GUI
KIST-NOMAD provides a neat and intuitive data search GUI design with a well laid out
sequence of information, which ensures easy navigation. It has both selectable and
text input functions. The search GUI has two main sections, the ‘Chemical Elements’ and the ‘Search Conditions.’
1. Chemical Elements – This shows the periodic table. Any clicked element will fill the element text box
in the search conditions section.Table 2.
2. Search Conditions – These are carefully selected search conditions based on familiar materials properties
and aimed at giving users the best options when searching for a data. The search conditions
are as follows.
a. Element – The selected element(s) from the periodic table will appear in this box. It also
allows for direct user input.
b. Crystal System – The specified crystal system is based on the various classes of space groups.
c. System Type – The available system type options are 0D/Cluster, 1D, 2D/Surface-Adsorption, 3D/Bulk
and Atom/Molecule.
d. Method – The method is a list of computational codes. The available options are Abinit,
BigDFT, Quantum Espresso and VASP.
e. Basis Set Type – This includes the basis set. In the search query, it is one option. The available
options are Plane Waves, Gaussian and Wavelets.
f. XC Functional – Using this we select data with a specific exchange correlation function. The available
options include GGA, DFT+U and Hybrid.
g. Authors – This is a list of users who have uploaded data to the repository.
h. Compound Type – The compound type option is based on the number of elements present in each calculation.
The compound types and their corresponding number of elements are shown in the Table 3.
i. Access Type – Restricted or Open Access permission. Open Access is the default option, which means that user can download
both data files and search results.
3.1.2 Data search
In a data repository, the primary activity is searching for data. KIST-NOMAD implements
a reliable and efficient search algorithm capable of handling all user requests in
the shortest possible time. Searches can be performed when just a chemical element
or formula is specified. The web implementation uses Java Persistence Query Language
(JPQL) [24] to form a query from the selected elements and search conditions.
select statement also uses regular expressions to define a search pattern in the query.
The defined search pattern helps to retrieve the exact requested data. The default
data access type is always added to the query. This query is converted into Structured
Query Language (SQL)’s select statements. Then it is parsed to the database to retrieve
data from the materialized views. SQL is used to communicate with relational databases
and perform tasks such as select, update and delete.
For example, to retrieve all the Aluminum based computation results, we would use
a JPQL query such as SELECT e FROM new_view_grouped e WHERE ((e.chemicalFormula =
'Al' OR e.chemicalFormula REGEXP 'Al[0-99].*')) AND e.permission = :accesstype. This
means ‘select all aluminum computational results that have open access permission.’ In this command, the most important part is the regular expression ‘Al[0-99].*’.
This will ensure that the query retrieves any data with ‘Al’ with any number between
‘0-9’ and any one character after the number and any other character. The first ten
records of the searched results using the above query are shown in Fig 5.
Chemical Formula, Space Group, Total Atom Number, Total Energy, Magnetic Moment, Band
Gap, Band Gap Type, Cell Optimized, XC Functional, Code Versions, Encut, KPoints and
PSP Versions are materials properties extracted from the uploaded calculations files
with parsers and scripts. System Type is selected during upload. References to any
published work are hyperlinked in the references column. Author(s) information is
the name of the user (who uploaded the calculation files). It is automatically added
to the calculation. Where there are ‘coauthors’, they are added by the user. All uploaded
files for a calculation can be viewed in the Uploaded Files column.
3.1.3. GUI data search result
The results of GUI data search are presented in a table format and displayed on the
results GUI. The results set is a carefully selected set of materials properties which
are descriptive of the calculation they represent. The results among other things
also allow for the quantitative comparison of calculation data. Each column in the
table presents a specific materials property as defined in the database. Any column
with N/A means data is not available.
The total energy column displays the total energy of the calculation in electron volts
(eV) at temperature 0 K. This is the final energy(sigma → 0) value in the VASP OUTCAR
file. The command (sigma → 0) means the SIGMA value, which is used to maintain the
rise in temperature for VASP calculations being extrapolated to zero, hence the energy
(sigma → 0) is equal to the energy at 0 K.
In the bad gap column, there are three types of values such as --, N/A and a value
such as 0.007. If the calculated band gap is less than 0.005, it is represented as
‘--' in the result set. Any other band gap value greater than or equal to 0.005 is
presented together with the band gap type.
For VASP calculations, the condition for calculating the band gap is that the sum
of the total drift in the final relaxation step be less than or equal to 0.001 (≤0.001).
If this condition is not met, the band gap value is marked as N/A.
Magnetic moments values are only retrieved for spin calculations. N/A is presented
for calculations with no SPIN. Cell optimized is determined by the value of the Pullay
Stress. Yes is for Pullay Stress with 0.0 kB, while No is for any other value. Space
group is presented in the HermannMauguin notation [25]. The defined space group is a combination of an uppercase letter for the lattice
type and symbols identifying the symmetry elements. For example, in space group Pmmm,
P is the lattice type and mmm is for the symmetry elements.
The K-Points column displays 3 kinds of values. Two types of values are for non-band-structure
calculations, for example 8x8x8(M) and 8x8x8(G). The (M) represents MonkhorstPack,
and (G) is for Gamma. The third type of value is for band-structure calculations,
for example Line-mode(20). Line-mode indicates the calculation is for band-structures
and the (20) is the number of steps. When Line-mode(20) is selected, the content of
the KPOINTS file is displayed.
3.1.4 The API data search
KIST-NOMAD also provides a restful application programming interface (API) with functions
that allow the search, retrieval of data and downloading of archive data files. APIs
help in data exchange between two applications. A user sends a data retrieval request
to the database though the API. The database retrieves application retrieves the data
and performs any necessary actions and presents the results to the user in JavaScript
Object Notation (JSON) format [26]. The returned result does not include any materials properties but rather URLs to
the calculation archive files, as shown in Fig 6.
The given URL for KIST-NOMAD API is http://nomad.kist.re.kr:8080/nomad/rest/api/search..
As in GUI data search, search conditions are also specified when using the API. The
following case sensitive keywords can be appended to the URL as search conditions:
element, system_type, crystal_system, calculation, basis_set_type, xctreatment, author
and compoundType. The conditions can be used individually or combined as required.
For example, element=Si is appended to the API URL to retrieve all Silicon computation
results in the database such as http://nomad.kist.re.kr:8080/nomad/rest/api/search?element=Si
3.2 Downloading data and results files
All the KIST-NOMAD open access data and data files are available for download. The
download of data and files are made possible by three download functions which are
available on the results GUI. The three functions allow the download of (1) Materials
data in csv format, (2) Archived files in zipped format, and (3) Individual files
also in zipped format.
3.2.1 Materials data download
All the materials properties presented in the result set are downloadable in comma-separated
values (csv) format. Materials data in the csv format is useful as input for machine
learning and data analysis tasks. The data in the csv file is in the same order and
format as the result set from chemical formula to pseudopotential (psp) versions.
The content of the csv file is from chemical formula to pseudopotential versions because
these properties are usually used for analysis and machine learning purposes. The
user can select up to but not more than 100 results (the maximum number of results
per page) for downloading at one time. The formatting of the csv, such as space group,
is done by writing the Hermann-Mauguin notation instead of the number in the csv files.
This helps to get the csv file content in the same format as the search result set.
The CSVWriter of OpenCSV [32] is used in writing the database values into the csv file.
3.2.2 Compressed/archived files download
The archived/compressed file for each calculation can be downloaded. These files are
stored in KIST-NOMAD’s uploads file directory. Downloading the archived files is particularly
useful when all the uploaded files for a calculation is needed in bulk or small amounts.
The archived files of the selected calculations are placed in a zipped folder during
download. The archived files for the entire result set is available for download but
only 100 can be downloaded at one time.
3.2.3 Individual files download
Additionally, for each calculation result, the individual input and output files such
as OUTCAR, POSCAR, KPOINTS, and etc. can be downloaded. These files are stored in
the extracted data files directory. This download is useful when only specific calculation
files are needed. The uploaded files for a calculation are as shown in Fig 7. The files can be downloaded from this GUI.
3.3 Uploading calculation files
Uploading calculations data files to KIST-NOMAD is simplified by the use of an upload
GUI. Multiple calculations files in .tar.gz format can be uploaded at a time. During
the upload, the system type (2D, 3D, etc.) of the calculations to be uploaded must
be selected. A log-in account is required for data files upload.
The uploaded files are first kept in the uploads directory. They are then copied to
the extracted folder where they are extracted. Parsers and scripts then automatically
extract and calculate all the defined materials properties from the designated files
and save them in the database. KIST-NOMAD aims to provide quality and reliable materials
data to users, therefore the parsers and scripts are written to produce very accurate
results. The user’s (uploader) information is also saved in the database and mapped
in a one-to-many relationship to their calculations. This process is illustrated in
Fig 8.
All the uploaded calculations details are instantly available to the owner, the user
who uploaded them and are read only to all other system users. The owner can grant
file and data download access to their calculations when the read only restriction
is removed (made open access) or when the calculations are shared with selected individuals
or groups. If there is/are any published work based on the uploaded calculations,
they can be added. All these functions are available on the upload GUI as shown in
Fig 9.