What's Changed (Aug 31, 2024)
-
Data Updates in Alzkb:
- DrugBank: Updated to version 5.1.12 (2024-03-14)
- NCBI Gene: Updated to V2024-05-13
- Gene Ontology: Updated to V2024-04-24*
- MESH: Updated to V2023-12*
- Uberon: Updated to V2024-03-22*
- DrugCentral: Updated to V2023-11-01*
- BindingDB: Updated to V2024-05*
- MEDLINE: Updated to V2024-05-02*
*Updates based on Hetionet. Please see the alzkb-updates Github repository for more details.
-
Enhancements:
-
Added TranscriptionFactor nodes and TRANSCRIPTIONFACTORINTERACTSWITHGENE relationships.
-
Added chromosome number as a property to gene nodes.
-
Added sourcedatabase as properties to nodes.
-
Added correlation, score, p_fisher, z_score, affinity_nm, confidence, sourcedatabase, and unbiased, from Hetionet, DisGeNET, and DoRothEA as properties to relationships.
Relationships Properties CHEMICALBINDSGENE ['sourceDB', ' unbiased', ' affinity_nM'] CHEMICALDECREASESEXPRESSION ['sourceDB', ' unbiased', ' z_score'] CHEMICALINCREASESEXPRESSION ['sourceDB', ' unbiased', ' z_score'] DISEASELOCALIZESTOANATOMY ['sourceDB', ' unbiased', ' p_fisher'] GENEASSOCIATESWITHDISEASE ['sourceDB', ' score'] GENECOVARIESWITHGENE ['sourceDB', ' unbiased', ' correlation'] SYMPTOMMANIFESTATIONOFDISEASE ['sourceDB', ' unbiased', ' p_fisher'] TRANSCRIPTIONFACTORINTERACTSWITHGENE ['sourceDB', ' confidence']
The instructions for adding new data resources and importing data to the Memgraph graph database are available at alzkb Github repository.
-
-
Data Quality Improvements:
- Removed the mapping between Creutzfeldt-Jakob disease (CJD) and Familial Alzheimer Disease (FAD). CJD and FAD are different diseases but got merged to the same node in AlzKB because of the DisGeNET “disease_mappings.tsv” file, in which CJD is mapped to FAD.
- Filtered genes to keep human genes only (tax-id = 9606).
- Implemented case-insensitive matching when extracting Alzheimer’s data from DisGeNET to include disease names that are in all caps.
- Consolidated pathways with the same names but different values of pathwayid and sourcedatabase.
- Removed duplicated pathways from AOP-DB that have “Homo sapiens (human)” in their names.
- Removed 21,724 Drug nodes from AOP-DB that had only xrefmesh values and NULL as commonName and were not connected to any other nodes.
Summary of the changes in nodes and relationships
Nodes:
Label | NodeCount | NodeCount previous version | NumChanges |
---|---|---|---|
BiologicalProcess | 12322 | 11381 | 941 |
BodyPart | 652 | 402 | 250 |
CellularComponent | 1695 | 1391 | 304 |
Disease | 34 | 20 | 14 |
Drug | 16581 | 36959 | -20378 |
DrugClass | 474 | 345 | 129 |
Gene | 193279 | 193313 | -34 |
MolecularFunction | 3460 | 2884 | 576 |
Pathway | 4516 | 4570 | -54 |
Symptom | 505 | 438 | 67 |
TranscriptionFactor | 519 | 519 | |
Total | 234037 | 251703 | -17666 |
Relationships:
Type | RelCount | RelCount previous version | NumChanges |
---|---|---|---|
BODYPARTOVEREXPRESSESGENE | 97772 | 97772 | 0 |
BODYPARTUNDEREXPRESSESGENE | 102185 | 102185 | 0 |
CHEMICALBINDSGENE | 25726 | 11531 | 14195 |
CHEMICALDECREASESEXPRESSION | 21051 | 21051 | 0 |
CHEMICALINCREASESEXPRESSION | 18713 | 18713 | 0 |
DISEASELOCALIZESTOANATOMY | 33 | 29 | 4 |
DRUGCAUSESEFFECT | 2 | 2 | 0 |
DRUGINCLASS | 1945 | 1029 | 916 |
DRUGTREATSDISEASE | 9 | 9 | 0 |
GENEASSOCIATEDWITHCELLULARCOMPONENT | 88880 | 73553 | 15327 |
GENEASSOCIATESWITHDISEASE | 508 | 502 | 6 |
GENECOVARIESWITHGENE | 61606 | 61606 | 0 |
GENEHASMOLECULARFUNCTION | 104752 | 97191 | 7561 |
GENEINPATHWAY | 178991 | 179433 | -442 |
GENEINTERACTSWITHGENE | 147088 | 147001 | 87 |
GENEPARTICIPATESINBIOLOGICALPROCESS | 548285 | 559385 | -11100 |
GENEREGULATESGENE | 263978 | 265667 | -1689 |
SYMPTOMMANIFESTATIONOFDISEASE | 53 | 79 | -26 |
TRANSCRIPTIONFACTORINTERACTSWITHGENE | 6910 | 6910 | |
TOTAL | 1668487 | 1636738 | 31749 |
The full database dump can be downloaded from the following link: https://cedars.box.com/v/alzkb-v2-0-0
Instruction for Installing from the CYPHERL file can be found here.