Skip to content

Version 2.0.0

Latest
Compare
Choose a tag to compare
@xixilili xixilili released this 03 Sep 17:27
· 1 commit to master since this release

What's Changed (Aug 31, 2024)

  1. Data Updates in Alzkb:

    • DrugBank: Updated to version 5.1.12 (2024-03-14)
    • NCBI Gene: Updated to V2024-05-13
    • Gene Ontology: Updated to V2024-04-24*
    • MESH: Updated to V2023-12*
    • Uberon: Updated to V2024-03-22*
    • DrugCentral: Updated to V2023-11-01*
    • BindingDB: Updated to V2024-05*
    • MEDLINE: Updated to V2024-05-02*
      *Updates based on Hetionet. Please see the alzkb-updates Github repository for more details.
  2. Enhancements:

    • Added TranscriptionFactor nodes and TRANSCRIPTIONFACTORINTERACTSWITHGENE relationships.

    • Added chromosome number as a property to gene nodes.

    • Added sourcedatabase as properties to nodes.

    • Added correlation, score, p_fisher, z_score, affinity_nm, confidence, sourcedatabase, and unbiased, from Hetionet, DisGeNET, and DoRothEA as properties to relationships.

      Relationships Properties
      CHEMICALBINDSGENE ['sourceDB', ' unbiased', ' affinity_nM']
      CHEMICALDECREASESEXPRESSION ['sourceDB', ' unbiased', ' z_score']
      CHEMICALINCREASESEXPRESSION ['sourceDB', ' unbiased', ' z_score']
      DISEASELOCALIZESTOANATOMY ['sourceDB', ' unbiased', ' p_fisher']
      GENEASSOCIATESWITHDISEASE ['sourceDB', ' score']
      GENECOVARIESWITHGENE ['sourceDB', ' unbiased', ' correlation']
      SYMPTOMMANIFESTATIONOFDISEASE ['sourceDB', ' unbiased', ' p_fisher']
      TRANSCRIPTIONFACTORINTERACTSWITHGENE ['sourceDB', ' confidence']

    The instructions for adding new data resources and importing data to the Memgraph graph database are available at alzkb Github repository.

  3. Data Quality Improvements:

    • Removed the mapping between Creutzfeldt-Jakob disease (CJD) and Familial Alzheimer Disease (FAD). CJD and FAD are different diseases but got merged to the same node in AlzKB because of the DisGeNET “disease_mappings.tsv” file, in which CJD is mapped to FAD.
    • Filtered genes to keep human genes only (tax-id = 9606).
    • Implemented case-insensitive matching when extracting Alzheimer’s data from DisGeNET to include disease names that are in all caps.
    • Consolidated pathways with the same names but different values of pathwayid and sourcedatabase.
    • Removed duplicated pathways from AOP-DB that have “Homo sapiens (human)” in their names.
    • Removed 21,724 Drug nodes from AOP-DB that had only xrefmesh values and NULL as commonName and were not connected to any other nodes.

Summary of the changes in nodes and relationships
Nodes:

Label NodeCount NodeCount previous version NumChanges
BiologicalProcess 12322 11381 941
BodyPart 652 402 250
CellularComponent 1695 1391 304
Disease 34 20 14
Drug 16581 36959 -20378
DrugClass 474 345 129
Gene 193279 193313 -34
MolecularFunction 3460 2884 576
Pathway 4516 4570 -54
Symptom 505 438 67
TranscriptionFactor 519 519
Total 234037 251703 -17666

Relationships:

Type RelCount RelCount previous version NumChanges
BODYPARTOVEREXPRESSESGENE 97772 97772 0
BODYPARTUNDEREXPRESSESGENE 102185 102185 0
CHEMICALBINDSGENE 25726 11531 14195
CHEMICALDECREASESEXPRESSION 21051 21051 0
CHEMICALINCREASESEXPRESSION 18713 18713 0
DISEASELOCALIZESTOANATOMY 33 29 4
DRUGCAUSESEFFECT 2 2 0
DRUGINCLASS 1945 1029 916
DRUGTREATSDISEASE 9 9 0
GENEASSOCIATEDWITHCELLULARCOMPONENT 88880 73553 15327
GENEASSOCIATESWITHDISEASE 508 502 6
GENECOVARIESWITHGENE 61606 61606 0
GENEHASMOLECULARFUNCTION 104752 97191 7561
GENEINPATHWAY 178991 179433 -442
GENEINTERACTSWITHGENE 147088 147001 87
GENEPARTICIPATESINBIOLOGICALPROCESS 548285 559385 -11100
GENEREGULATESGENE 263978 265667 -1689
SYMPTOMMANIFESTATIONOFDISEASE 53 79 -26
TRANSCRIPTIONFACTORINTERACTSWITHGENE 6910 6910
TOTAL 1668487 1636738 31749

The full database dump can be downloaded from the following link: https://cedars.box.com/v/alzkb-v2-0-0

Instruction for Installing from the CYPHERL file can be found here.