Paulina Rybicka, Preludium 22, 1.07.2024 – 30.06.2026
Polish National Science Center
Summary of the project:
The three-dimensional structure of macromolecules, proteins, and their complexes can be determined from the charge density distribution obtained from the X-ray diffraction experiment. Acquiring high-resolution X-ray diffraction data for macromolecules is challenging, but necessary to reconstruct the electron density of structures with the aspherical Multipole Model (MM) which outperforms the spherical Independent Atom Model (IAM) when it comes to describing the deformation of electron density from chemical bonds and lone electron pairs. To get past this limitation and make use of low-resolution data, the transferability of multipole model parameters between chemically equivalent atoms is applied, and obtained definitions of atom types with averaged parameters are stored in data banks. The MATTS (Multipolar Atom Types from Theory and Statistical clustering) data bank, developed by our group, was created using molecules from the Cambridge Structural Database (CSD) and gathers MM parameters for 651 atom types. In this project, we want to find a solution for a problem with currently existing aspherical data banks – even though they can be used to reconstruct the electron density of macromolecules, the following errors will not make it possible to assign atom types: disorder, missing atoms in the structure, lack of hydrogen atoms, distorted geometry, or atoms too close to each other. For this, we will modify the previously established procedure of applying the MATTS data bank by introducing a new approach to recognizing atom types tailored specifically for proteins, RNA, DNA, and most common ligands. The new data bank will use the information deposited in the Protein Data Bank (PDB). Specifically, it will recognize atom types directly on the basis of the names of atoms present in the structures of proteins, RNA, DNA, and most common ligands, defined in the mmCIF format. The problem of distorted geometry, out of the reach of the current MATTS data bank, would be captured by the new data bank with the names of atoms used and differentiating between various conformers (A, B, etc.) of the protein. Also, it could resolve problems with hydrogen atoms for atomic resolution, which are difficult to locate because the electron density distribution from their single electron is deformed towards the bond. The realization of this project will improve the quality of determination of the molecular structure and reconstructing the electron density distribution of proteins, RNA, DNA, and most common ligands. The data bank can be used in structure refinement on X-ray diffraction data, electron diffraction data, and single-particle cryo-EM data. This is crucial for structural biology and chemistry as the analysis of the charge-density distribution allows describing functions of macromolecules, discovering how the changes in structure can affect them, or understanding the interactions present in molecular complexes.