SlideShare a Scribd company logo
1
Combining Explicit and
Latent Web Semantics
Paul Groth - @pgroth
Elsevier Labs
BigNet : WWW 2018
Thanks to Ron Daniel, Brad Allen & the Labs Team
Empowering
KnowledgeTM
for Maintaining Knowledge Graphs
2
Outline
Goal: to tell you our current thinking and to get your feedback
• Why we’re interested
• What we’ve tried
• What we’re missing
• Webby Data
• 2 Sources of Semantics
• State of the art
• What’s missing
Warning: The back half is like a probably incomplete
literature review so think of this as pointers 
3
Knowledge Graphs
4
5
EMMeT (Elsevier Merged Medical Taxonomy)
EMMeT is a multilingual, concept-based clinical ontology
• Multilingual: English, French, Spanish
• Concept-based: All terms, synonyms, translations, mappings are
related to a unique identifier (“IMUI”)
• Ontology: Provides semantic relationships between concepts
(symptoms of a disease, treatment procedures of a disease,
complications of a disease or a procedure, etc…)
EMMeT is a controlled reference terminology
• Based on Unified Medical Language System (UMLS), standard clinical
terminologies as well as Elsevier proprietary vocabularies and lists of
acronyms
• Explicitly mapped to international medical standards (SNOMED-CT,
ICD-9-CM, ICD-10-CM, LOINC, RXNorm, CVX, etc.) and Elsevier’s
vocabularies (Gold Standard, EMTREE, etc.)
EMMeT is current
• Continuously updated, and released every 12 weeks for automatic
indexing
• Updated daily and available via an API for manual tagging access
• Maintained by a team of medical terminology experts,
6
Automated Tagging
Manual Tagging/
Data Structuring
Products and platforms using EMMeT
Clinical Solutions
ClinicalKey Global
ClinicalKey ANZ
ClinicalKey France
ClinicalKey Espanol
ClinicalKey Nursing
ClinicalKey German
ClinicalKey Nursing ANZ
ClinicalKey Brazil
Amirsys Decision Point
RP/STMJ
Health Advance
The Lancet
Cell
LexisNexis
MedMal Nav
LN Insight
Legend
In production
In Pilot
In Pipeline
Nursing Education
Mosby’s Dictionary
Clinical Solutions
PoC - Clinical Overviews
ClinicalKey HL7 API
Health Analytics
IDS FHIR API/Apps
Dorland’s Dictionary
Patient Engagement
Gold Standard CP
ERC
Content 2.0
Nursing Education
Sherpath
EMEALAAP
MedEnact
RP/STMJ (SCT)
Health Advance
The Lancet
Cell
7
EMMeT Clinical Knowledge Graph
8
Rankings of EMMeT’s ontological relationships
• Relationships are ranked according to 5-tiered ranking model: for simplicity and accessibility.
• 10: best option;
• 9: second option. When the rank of 10 is not applicable;
• 8: given two concepts that are too general to be directly related to a specific disease;
• 7: is used as an outlier.
• 6: default / non validated.
Relationship
Ranking Criteria
10 9 8 7
has cause most common common sometimes rare
has clinical finding most common common sometimes rare
has_complication severity (disease) severe/death high moderate low morbiditiy
has_complication prevalence (disease) Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs Rare occurrence
has_complication severity (procedure) critical/death major moderate minor
has comorbidity strongly associated Commonly associated Sometimes associated Rarely associated
has screening procedure best choice is done sometimes done rarely done
has risk factor strongly associated Commonly associated Sometimes associated Rarely associated
has diagnostic procedure best choice commonly done sometimes done rarely done
has differential diagnosis Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs/ low prevalence Rare occurrence
has drug best choice 2nd line 3rd line rarely given
has contraindication drug Strongly avoid/black box Commonly avoid Sometimes avoid Rarely avoid
has treatment procedure best choice commonly done sometimes done rarely done
has prevention Best option common option sometimes advised rarely advised
has physician specialty specific specialty general/specialty broad rare
has device standard device acceptable device sometimes used rarely used
9
From EMMeT to H-Graph
• Based on EMMeT
• Support more complex relations including patient context (Clinical Overview content + more)
• Flexible and extensible model to support links to content, model treatment strategies, numeric values, temporal
data, etc. Age, sex, weight, … are very simple context.
In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate
or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48
hours or is uncertain. NICE Guideline Atrial Fibrillation: Management
• Continue to support existing indexing pipelines (e.g. ClinicalKey), and tagging use cases (e.g. Clinical Overviews)
From EMMeT… …To H-Graph
10
Universal Schemas
11
Universal schemas
• … are a specific technique from the Information Extraction and the Automatic Knowledge Base
Completion literature
• … are an unsupervised method to ‘learn’ by combining text extracts with existing knowledge base
assertions
• Applications:
• Extend a medical knowledge base
• scan incoming literature to suggest new additions to EMMeT and show the
underlying evidence to the taxonomy editor.
• scan literature backlog to find evidence for data already in EMMeT
• Literature Surveillance
• scan incoming literature to find existing facts even if expressed in very different ways
• find new concepts in the literature related to an existing EMMeT concept*. Let taxonomy
editor decide whether to add new concept and relation to EMMeT
12
Open Information Extraction
• Knowledge bases are populated by scanning text and doing Information Extraction
• Most information extraction systems are looking for very specific things, like drug-drug interactions
• Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text
• For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar
• One weird trick for open information extraction …
• ReVerb*:
1. Find “relation phrases” starting with a verb and ending with a verb or preposition
2. Find noun phrases before and after the relation phrase
3. Discard relation phrases not used with multiple combinations of arguments.
In addition, brain scans were performed to exclude
other causes of dementia.
* Fader et al. Identifying Relations for Open Information Extraction
13
ReVerb output
After ReVerb pulls out noun phrases, match them up to EMMeT concepts
Discard rare concepts, relations, or relations that are not used with many different concepts
# SD Documents Scanned 14,000,000
Extracted ReVerb Triples 473,350,566
14
Universal schemas - Initialization
• Method to combine ‘facts’ found by
machine reading with stronger
assertions from ontology.
• Build ExR matrix with entity-pairs
as rows and relations as columns.
• Relation columns can come from
EMMeT, or from ReVerb
extractions.
• Cells contain 1.0 if that pair of
entities is connected by that
relation.
15
Universal schemas - Prediction
• Factorize matrix to ExK and KxR,
then recombine.
• “Learns” the correlations between
text relations and EMMeT relations,
in the context of pairs of objects.
• Find new triples to go into EMMeT
e.g., (glaucoma,
has_alternativeProcedure,
biofeedback)
16
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
Concept
Resolution
14M
SD articles
475 M
triples
3.3 million
relations
49 M
relations
~15k ->
1M
entries
Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel
“Applying Universal Schemas for Domain Specific Ontology Expansion”
5th Workshop on Automated Knowledge Base Construction (AKBC) 2016
Michael Lauruhn, and Paul Groth. "Sources of Change for Modern
Knowledge Organization Systems." Knowledge Organization 43, no. 8
(2016).
ONTOLOGY MAINTENANCE
• Pretty good F measure around -.7
• Good enough with human in the loop
• But we want more!
17
Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation
methods." Semantic web 8.3 (2017): 489-508.
WHERE TO GO?
18
MORE THAN LINK PREDICTION
• Data has deep hierarchy –link prediction flattens this
• Data has hooks into specific content
• Schemas are increasingly richly defined – not just a
single type
• N-ary relations
19
OUR KG’S SHARE PROPERTIES WITH WEB KGS
Ringler, Daniel, and Heiko Paulheim. "One knowledge graph to rule them all? Analyzing
the differences between DBpedia, YAGO, Wikidata & co." Joint German/Austrian
Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, Cham, 2017.
20
The Web of Data
http://webdatacommons.org/structureddata/
2017-12/stats/stats.html
http://lodlaundromat.org
21
Two sources of semantics
1.Dereferenceablity
2.Rules
22
Dereferenceablity
Looking definitions up – Natural Language and Programmatic
23
WIKIDATA VOCABULARY
24
Pay attention to the underlying data
Paul Groth, Michael Lauruhn, Antony Scerri: “Open Information Extraction on Scientific Text: An
Evaluation”, 2018; [http://arxiv.org/abs/1802.05574 arXiv:1802.05574]
25
Embed more
Gupta, N., Singh, S., & Roth, D. (2017). Entity linking via joint encoding of types,
descriptions, and context. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing (pp. 2681-2690).
26
Embed more
Both, Fabian, Steffen Thoma, and Achim Rettinger. "Cross-modal Knowledge Transfer:
Improving the Word Embedding of Apple by Looking at Oranges." Proceedings of the
Knowledge Capture Conference. ACM, 2017.
27
Social Semantics?
de Rooij, S., Beek, W., Bloem, P., van Harmelen, F., & Schlobach, S. (2016, October).
Are Names Meaningful? Quantifying Social Meaning on the Semantic Web.
In International Semantic Web Conference (pp. 184-199). Springer, Cham.
• Distributional semantics for
identifiers (NTN)
• But uses the global network
• Could we use the discussion
space as well?
NTN - Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013).
Reasoning with neural tensor networks for knowledge base
completion. In Advances in neural information processing
systems (pp. 926-934).
28
schema:dateModified a rdf:Property ;
rdfs:label
29
Injecting Background Knowledge as Constraints
Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting logical background knowledge into embeddings for relation
extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (pp. 1119-1129)
30
Learning Rules
Yang, Fan, Zhilin Yang, and William W. Cohen. "Differentiable learning of logical
rules for knowledge base reasoning." Advances in Neural Information Processing
Systems. 2017.
31
Combing Both – supporting complex reasoning with subsymbolic representations
Rocktäschel, T., & Riedel, S. (2017). End-to-end
differentiable proving. In Advances in Neural Information
Processing Systems (pp. 3791-3803).
32
Future
Welbl, J., Stenetorp, P., & Riedel, S. (2017). Constructing Datasets for
Multi-hop Reading Comprehension Across Documents. arXiv preprint
arXiv:1710.06481.
•Scale
•The knowledge base == text?
•Multi-hop reasoning
•Is everything end-to-end
differentiable
33
Conclusion
• In practice: data is webby data
• Messy
• Interconnected
• Constraints and rules associated
• Semantic Web: semantics can come from multiple different sources
• Explicit & implicit
• Take advantage of those sources
• Knowledge graphs benefit from inference
• Your thoughts?
• Thanks & We’re hiring!
p.groth@elsevier.com | pgroth.com
labs.elsevier.com
34
Backup
35
INTEGRATION OF LARGE NUMBERS OF DATA SOURCES
Groth, Paul, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE ,
vol.28, no.5, pp.44,48, Sept.-Oct. 2013 doi: 10.1109/MIS.2013.138
• 10 different extractors
• E.g mapping-based infobox extractor
• Infobox uses a hand-built ontology based on the 350
• Based on acommonly used English language
infoboxes
• Integrates with Yago
• Yago relies on Wikipedia + Wordnet
• Upper ontology from Wordnet and then a mapping to
Wikipedia categories based frequencies
• Wordnet is built by psycholinguists
36
Units & Measurement Annotations
• Time
• Dosage
• Probability
• Percent
• Count
• Not handled yet
Find numbers followed by a unit name or abbreviation (perhaps with scale factor like k, m, G, …). Provide value
normalized to SI units. Also provide type of measurement (time, temperature, length, mass, dosage, etc.) based on
unit. Handling tolerances, ranges, probabilities, and counts adds complexity. Conjunctions not yet handled but very
important.
Current work – identify the property being measured (e.g. dosages of AA, indomethacin, HtE, leptin, etc.)
Additionally at 120 min following glucose administration, the 100 mg/kg 5g and 5e groups had
significantly (P ⩽ 0.005) a greater drop in blood glucose than the 10 and 50 mg/kg groups.
In the mouse xenograft model of LLC cells in C57BL/6J mice, once daily administration of AA (50 and
100 mg/kg) inhibited tumor growth in a dose-dependent manner (Fig. 6A and C).
Groups of Swiss mice (n = 6) were treated (p.o.) with vehicle, indomethacin (10 mg/kg-Roche®) or HtE
(50, 100 or 200 mg/kg) 1 h before administration of carrageenan at 2.5% (Sigma-Aldrich®) injected
subcutaneously into the plantar region of the left hind paw and phosphate buffer saline (PBS) in
right hind paw.
In the experiments designed to study the antidepressant-like effect of the repeated treatment (for
14 days) of EET, the immobility time in the TST and the locomotor activity in the open-field were
assessed in independent groups of mice 24 h after the last daily administration of EET (10–100
mg/kg, p.o.).
Hoppers containing chow were removed from the cages 1 h before the administration of leptin
[depending on studies, 5 mg/kg or 2.5 mg/kg, ip; mouse recombinant leptin obtained from Dr. A.F.

More Related Content

What's hot (20)

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
Fair by design
Fair by designFair by design
Fair by design
Pistoia Alliance
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
Richard Layton
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
Pistoia Alliance
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Maulik Kamdar
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
Pistoia Alliance
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
Susanna-Assunta Sansone
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
Susanna-Assunta Sansone
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
Richard Layton
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
Pistoia Alliance
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Maulik Kamdar
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
Pistoia Alliance
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
Susanna-Assunta Sansone
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
Susanna-Assunta Sansone
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 

Similar to Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs (20)

Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
Vasa Curcin
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Koray Atalag
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
Barry Smith
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
Silje Ljosland Bakke
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
Koray Atalag
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
Russ Altman
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
Franz Inc. - AllegroGraph
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Health Informatics New Zealand
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Luis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Luis Marco Ruiz
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Anita de Waard
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
Koray Atalag
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperability
David Moner Cano
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
Machine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgeMachine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledge
Paul Agapow
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
Semantic Web San Diego
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celi
intensivecaresociety
 
Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
Vasa Curcin
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Koray Atalag
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
Barry Smith
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
Silje Ljosland Bakke
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
Koray Atalag
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
Russ Altman
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
Franz Inc. - AllegroGraph
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Health Informatics New Zealand
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Luis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Luis Marco Ruiz
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Anita de Waard
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
Koray Atalag
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperability
David Moner Cano
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
Machine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgeMachine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledge
Paul Agapow
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
Semantic Web San Diego
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celi
intensivecaresociety
 
Ad

More from Paul Groth (15)

Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
Paul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
Paul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
Ad

Recently uploaded (20)

DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Soulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate reviewSoulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate review
Soulmaite
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training RoadblocksDown the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
FCF- Getting Started in Cybersecurity 3.0
FCF- Getting Started in Cybersecurity 3.0FCF- Getting Started in Cybersecurity 3.0
FCF- Getting Started in Cybersecurity 3.0
RodrigoMori7
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptxISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
AyilurRamnath1
 
DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Soulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate reviewSoulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate review
Soulmaite
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training RoadblocksDown the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
FCF- Getting Started in Cybersecurity 3.0
FCF- Getting Started in Cybersecurity 3.0FCF- Getting Started in Cybersecurity 3.0
FCF- Getting Started in Cybersecurity 3.0
RodrigoMori7
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptxISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
AyilurRamnath1
 

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs

  • 1. 1 Combining Explicit and Latent Web Semantics Paul Groth - @pgroth Elsevier Labs BigNet : WWW 2018 Thanks to Ron Daniel, Brad Allen & the Labs Team Empowering KnowledgeTM for Maintaining Knowledge Graphs
  • 2. 2 Outline Goal: to tell you our current thinking and to get your feedback • Why we’re interested • What we’ve tried • What we’re missing • Webby Data • 2 Sources of Semantics • State of the art • What’s missing Warning: The back half is like a probably incomplete literature review so think of this as pointers 
  • 4. 4
  • 5. 5 EMMeT (Elsevier Merged Medical Taxonomy) EMMeT is a multilingual, concept-based clinical ontology • Multilingual: English, French, Spanish • Concept-based: All terms, synonyms, translations, mappings are related to a unique identifier (“IMUI”) • Ontology: Provides semantic relationships between concepts (symptoms of a disease, treatment procedures of a disease, complications of a disease or a procedure, etc…) EMMeT is a controlled reference terminology • Based on Unified Medical Language System (UMLS), standard clinical terminologies as well as Elsevier proprietary vocabularies and lists of acronyms • Explicitly mapped to international medical standards (SNOMED-CT, ICD-9-CM, ICD-10-CM, LOINC, RXNorm, CVX, etc.) and Elsevier’s vocabularies (Gold Standard, EMTREE, etc.) EMMeT is current • Continuously updated, and released every 12 weeks for automatic indexing • Updated daily and available via an API for manual tagging access • Maintained by a team of medical terminology experts,
  • 6. 6 Automated Tagging Manual Tagging/ Data Structuring Products and platforms using EMMeT Clinical Solutions ClinicalKey Global ClinicalKey ANZ ClinicalKey France ClinicalKey Espanol ClinicalKey Nursing ClinicalKey German ClinicalKey Nursing ANZ ClinicalKey Brazil Amirsys Decision Point RP/STMJ Health Advance The Lancet Cell LexisNexis MedMal Nav LN Insight Legend In production In Pilot In Pipeline Nursing Education Mosby’s Dictionary Clinical Solutions PoC - Clinical Overviews ClinicalKey HL7 API Health Analytics IDS FHIR API/Apps Dorland’s Dictionary Patient Engagement Gold Standard CP ERC Content 2.0 Nursing Education Sherpath EMEALAAP MedEnact RP/STMJ (SCT) Health Advance The Lancet Cell
  • 8. 8 Rankings of EMMeT’s ontological relationships • Relationships are ranked according to 5-tiered ranking model: for simplicity and accessibility. • 10: best option; • 9: second option. When the rank of 10 is not applicable; • 8: given two concepts that are too general to be directly related to a specific disease; • 7: is used as an outlier. • 6: default / non validated. Relationship Ranking Criteria 10 9 8 7 has cause most common common sometimes rare has clinical finding most common common sometimes rare has_complication severity (disease) severe/death high moderate low morbiditiy has_complication prevalence (disease) Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs Rare occurrence has_complication severity (procedure) critical/death major moderate minor has comorbidity strongly associated Commonly associated Sometimes associated Rarely associated has screening procedure best choice is done sometimes done rarely done has risk factor strongly associated Commonly associated Sometimes associated Rarely associated has diagnostic procedure best choice commonly done sometimes done rarely done has differential diagnosis Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs/ low prevalence Rare occurrence has drug best choice 2nd line 3rd line rarely given has contraindication drug Strongly avoid/black box Commonly avoid Sometimes avoid Rarely avoid has treatment procedure best choice commonly done sometimes done rarely done has prevention Best option common option sometimes advised rarely advised has physician specialty specific specialty general/specialty broad rare has device standard device acceptable device sometimes used rarely used
  • 9. 9 From EMMeT to H-Graph • Based on EMMeT • Support more complex relations including patient context (Clinical Overview content + more) • Flexible and extensible model to support links to content, model treatment strategies, numeric values, temporal data, etc. Age, sex, weight, … are very simple context. In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain. NICE Guideline Atrial Fibrillation: Management • Continue to support existing indexing pipelines (e.g. ClinicalKey), and tagging use cases (e.g. Clinical Overviews) From EMMeT… …To H-Graph
  • 11. 11 Universal schemas • … are a specific technique from the Information Extraction and the Automatic Knowledge Base Completion literature • … are an unsupervised method to ‘learn’ by combining text extracts with existing knowledge base assertions • Applications: • Extend a medical knowledge base • scan incoming literature to suggest new additions to EMMeT and show the underlying evidence to the taxonomy editor. • scan literature backlog to find evidence for data already in EMMeT • Literature Surveillance • scan incoming literature to find existing facts even if expressed in very different ways • find new concepts in the literature related to an existing EMMeT concept*. Let taxonomy editor decide whether to add new concept and relation to EMMeT
  • 12. 12 Open Information Extraction • Knowledge bases are populated by scanning text and doing Information Extraction • Most information extraction systems are looking for very specific things, like drug-drug interactions • Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text • For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar • One weird trick for open information extraction … • ReVerb*: 1. Find “relation phrases” starting with a verb and ending with a verb or preposition 2. Find noun phrases before and after the relation phrase 3. Discard relation phrases not used with multiple combinations of arguments. In addition, brain scans were performed to exclude other causes of dementia. * Fader et al. Identifying Relations for Open Information Extraction
  • 13. 13 ReVerb output After ReVerb pulls out noun phrases, match them up to EMMeT concepts Discard rare concepts, relations, or relations that are not used with many different concepts # SD Documents Scanned 14,000,000 Extracted ReVerb Triples 473,350,566
  • 14. 14 Universal schemas - Initialization • Method to combine ‘facts’ found by machine reading with stronger assertions from ontology. • Build ExR matrix with entity-pairs as rows and relations as columns. • Relation columns can come from EMMeT, or from ReVerb extractions. • Cells contain 1.0 if that pair of entities is connected by that relation.
  • 15. 15 Universal schemas - Prediction • Factorize matrix to ExK and KxR, then recombine. • “Learns” the correlations between text relations and EMMeT relations, in the context of pairs of objects. • Find new triples to go into EMMeT e.g., (glaucoma, has_alternativeProcedure, biofeedback)
  • 16. 16 Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 Michael Lauruhn, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016). ONTOLOGY MAINTENANCE • Pretty good F measure around -.7 • Good enough with human in the loop • But we want more!
  • 17. 17 Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8.3 (2017): 489-508. WHERE TO GO?
  • 18. 18 MORE THAN LINK PREDICTION • Data has deep hierarchy –link prediction flattens this • Data has hooks into specific content • Schemas are increasingly richly defined – not just a single type • N-ary relations
  • 19. 19 OUR KG’S SHARE PROPERTIES WITH WEB KGS Ringler, Daniel, and Heiko Paulheim. "One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & co." Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, Cham, 2017.
  • 20. 20 The Web of Data http://webdatacommons.org/structureddata/ 2017-12/stats/stats.html http://lodlaundromat.org
  • 21. 21 Two sources of semantics 1.Dereferenceablity 2.Rules
  • 22. 22 Dereferenceablity Looking definitions up – Natural Language and Programmatic
  • 24. 24 Pay attention to the underlying data Paul Groth, Michael Lauruhn, Antony Scerri: “Open Information Extraction on Scientific Text: An Evaluation”, 2018; [http://arxiv.org/abs/1802.05574 arXiv:1802.05574]
  • 25. 25 Embed more Gupta, N., Singh, S., & Roth, D. (2017). Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2681-2690).
  • 26. 26 Embed more Both, Fabian, Steffen Thoma, and Achim Rettinger. "Cross-modal Knowledge Transfer: Improving the Word Embedding of Apple by Looking at Oranges." Proceedings of the Knowledge Capture Conference. ACM, 2017.
  • 27. 27 Social Semantics? de Rooij, S., Beek, W., Bloem, P., van Harmelen, F., & Schlobach, S. (2016, October). Are Names Meaningful? Quantifying Social Meaning on the Semantic Web. In International Semantic Web Conference (pp. 184-199). Springer, Cham. • Distributional semantics for identifiers (NTN) • But uses the global network • Could we use the discussion space as well? NTN - Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926-934).
  • 28. 28 schema:dateModified a rdf:Property ; rdfs:label "dateModified" ; schema:domainIncludes schema:CreativeWork, schema:DataFeedItem ; schema:rangeIncludes schema:Date, schema:DateTime ; rdfs:comment "The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed." . schema:datePublished a rdf:Property ; rdfs:label "datePublished" ; schema:domainIncludes schema:CreativeWork ; schema:rangeIncludes schema:Date ; rdfs:comment "Date of first broadcast/publication." . schema:disambiguatingDescription a rdf:Property ; rdfs:label "disambiguatingDescription" ; schema:domainIncludes schema:Thing ; schema:rangeIncludes schema:Text ; rdfs:comment "A sub property of description. A short description of the item used to disambiguate from other, similar items. Information from other properties (in particular, name) may be necessary for the description to be useful for disambiguation." ; rdfs:subPropertyOf schema:description . https://www.w3.org/TR/rdf11-mt/ Rules
  • 29. 29 Injecting Background Knowledge as Constraints Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1119-1129)
  • 30. 30 Learning Rules Yang, Fan, Zhilin Yang, and William W. Cohen. "Differentiable learning of logical rules for knowledge base reasoning." Advances in Neural Information Processing Systems. 2017.
  • 31. 31 Combing Both – supporting complex reasoning with subsymbolic representations Rocktäschel, T., & Riedel, S. (2017). End-to-end differentiable proving. In Advances in Neural Information Processing Systems (pp. 3791-3803).
  • 32. 32 Future Welbl, J., Stenetorp, P., & Riedel, S. (2017). Constructing Datasets for Multi-hop Reading Comprehension Across Documents. arXiv preprint arXiv:1710.06481. •Scale •The knowledge base == text? •Multi-hop reasoning •Is everything end-to-end differentiable
  • 33. 33 Conclusion • In practice: data is webby data • Messy • Interconnected • Constraints and rules associated • Semantic Web: semantics can come from multiple different sources • Explicit & implicit • Take advantage of those sources • Knowledge graphs benefit from inference • Your thoughts? • Thanks & We’re hiring! [email protected] | pgroth.com labs.elsevier.com
  • 35. 35 INTEGRATION OF LARGE NUMBERS OF DATA SOURCES Groth, Paul, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013 doi: 10.1109/MIS.2013.138 • 10 different extractors • E.g mapping-based infobox extractor • Infobox uses a hand-built ontology based on the 350 • Based on acommonly used English language infoboxes • Integrates with Yago • Yago relies on Wikipedia + Wordnet • Upper ontology from Wordnet and then a mapping to Wikipedia categories based frequencies • Wordnet is built by psycholinguists
  • 36. 36 Units & Measurement Annotations • Time • Dosage • Probability • Percent • Count • Not handled yet Find numbers followed by a unit name or abbreviation (perhaps with scale factor like k, m, G, …). Provide value normalized to SI units. Also provide type of measurement (time, temperature, length, mass, dosage, etc.) based on unit. Handling tolerances, ranges, probabilities, and counts adds complexity. Conjunctions not yet handled but very important. Current work – identify the property being measured (e.g. dosages of AA, indomethacin, HtE, leptin, etc.) Additionally at 120 min following glucose administration, the 100 mg/kg 5g and 5e groups had significantly (P ⩽ 0.005) a greater drop in blood glucose than the 10 and 50 mg/kg groups. In the mouse xenograft model of LLC cells in C57BL/6J mice, once daily administration of AA (50 and 100 mg/kg) inhibited tumor growth in a dose-dependent manner (Fig. 6A and C). Groups of Swiss mice (n = 6) were treated (p.o.) with vehicle, indomethacin (10 mg/kg-Roche®) or HtE (50, 100 or 200 mg/kg) 1 h before administration of carrageenan at 2.5% (Sigma-Aldrich®) injected subcutaneously into the plantar region of the left hind paw and phosphate buffer saline (PBS) in right hind paw. In the experiments designed to study the antidepressant-like effect of the repeated treatment (for 14 days) of EET, the immobility time in the TST and the locomotor activity in the open-field were assessed in independent groups of mice 24 h after the last daily administration of EET (10–100 mg/kg, p.o.). Hoppers containing chow were removed from the cages 1 h before the administration of leptin [depending on studies, 5 mg/kg or 2.5 mg/kg, ip; mouse recombinant leptin obtained from Dr. A.F.

Editor's Notes

  • #6: 100+ years of expert knowledge
  • #8: On the left side we see one concept, breast cancer, and a number of pieces of informaiton about it such as synonyms, parent and child concepts, etc. On the right we see some ontological relations from breast cancer to other concepts, such as (breast cancer, has diagnostic procedure, breast biopsy). One of the major differences between EMMeT and what is in UMLS is that we not only provide the basic 3-part relationship, such as (breast cancer, has_treatment, radical mastectomy), we also provide information about the ‘strength’ of that relation according to current medical evidence.
  • #10: Excerpt from National Institute for Health and Care Excellence (In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain. . In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain.
  • #13: Using EMMeT, and some code and data we already had, he built a quick prototype and tested it. Performance (in terms of accuracy of predictions) was surprisingly high. Unsupervised is very important because it means the construction of the rough underlying knowledge base is scalable and not limited by the availability of experts. Raw predictions not good enough for fully automatic operation, but are plenty good enough to help taxonomy editors and other people do their job much faster.
  • #20: Complex axioms Messy Integrates lots of infromation
  • #26: Predict entity types
  • #27: Concept similarity Conc svd and pca are combinations
  • #28: verify the null hypothesis that names are statistically independent from the two meaning proxies
  • #31: SRL performance TRL 2 And inductive logic programming
  • #32: Translate to natural language (sli)
  • #37: One type of NLP annotation Labs is implementing is to mark up measurements – find the quantity, the unit, any tolerances, etc. We also normalize them to SI standards so measurements can be compared and searched. This is not novel research. However, we have not found prior work that attempts to detect the specific object and property being measured. We are using several domain-specific scenarios (mouse cancer, concrete additives, NLP algorithm accuracy, neuronal properties) to find ways that information is expressed. For mouse cancer, it is relatively easy to detect that a measurement is a dosage of a particular drug. But those patterns are of little use in the other scenarios. This work has application to the h-graph – dosages, ages, weights, etc. are all important properties for the patient context. Cohort size and probability are important for the quality of evidence measures.