SlideShare a Scribd company logo
Co-Constructing Explanations
for AI Systems using Provenance
Prof. Paul Groth | @pgroth | pgroth.com | indelab.org
Thanks to Jan-Christoph Kalo, Fina Polat, Shubha Guha, Enrico Daga
XAI-KG Workshop - ESWC 2025
Explanations increasingly required for AI systems
“While the explanation process involves the assimilation of knowledge,
it also transforms the learner’s knowledge and is thus an
accommodative process.
Thus, rather than referring to "explanations" (and assuming that the
property of being an explanation is a property of statements), it might
be prudent to refer to explaining, and regard explaining an active
exploration process. ”
03.04.23 3
Explanation by Exploration or Self Explanation
S.T. Mueller, R.R. Hoffman, W. Clancey, A. Emrey and G. Klein, Explanation in
Human-AI Systems: A Literature Meta-Review, Synopsis
of Key Ideas and Publications, and Bibliography for Explainable AI, arXiv, 2019.
https://arxiv.org/abs/ 25 1902.01876
“The process of explaining, and human-human communication
broadly, is a co-adaptive ‘tuning’ process, which requires that
the explainer and learner have a capacity to take each other's
perspective.”
03.04.23 4
A collaborative and co-adaptive process
S.T. Mueller, R.R. Hoffman, W. Clancey, A. Emrey and G. Klein, Explanation in
Human-AI Systems: A Literature Meta-Review, Synopsis
of Key Ideas and Publications, and Bibliography for Explainable AI, arXiv, 2019.
https://arxiv.org/abs/ 25 1902.01876
Complex AI Systems
Trace-Based Explanation
Shruthi Chari, Daniel M Gruen, Oshani Seneviratne, and Deborah L McGuinness. 2020.
Directions for explainable knowledge-enabled systems. In Knowledge Graphs for eXplainable
Arti
fi
cial Intelligence: Foundations, Applications and Challenges. IOS Press, 245–261.
Provenance is all you need!
def load_data(…):
input1 = pd.read_csv(…)
input1 = input1[input1[‘attr’] > 10]
input2 = pd.read_csv(…)
return input1.join(input2, …)
def featurise(…):
return ColumnTransformer(
[(‘categorical’), …, …)
(‘numerical’), …, …)])
all_data = load_(data)
train = all_data[all_data[‘date’] < …]
test = all_data[all_data[‘date’] >= …]
y_train = label_binarize(train[‘label’])
y_test = label_binarize(test[‘label’])
Pipeline = Pipeline([
(‘features’, featurise(…)),
(‘learner’, KerasClassifier(…))])
model = pipeline.fit(train, y_train)
quality = model.score(test, y_test)
input1
σattr>10
input2
⋈id=fk_id
1-hot
πattr
σdate<… σdate>=…
scale
πval
concat
FitClassifier
Score Xtrain
ytrain
D1
D2
Xtest
ytest
id attr label date
fk_id val
πlabel
binarize 1-hot
πattr
scale
concat
πlabel
binarize
πval
0, 1, 0, …, 0.23
1, 0, 0, …, 0.11
[ ]
0, 0, 1, …, 0.46
0, 1, 0, …, 0.17
[ ]
{(1,1)}
…
{(1,...)}
{(2,1)}
...
{(2,...)}
0
1
[ ]
1
0
[ ] {(1,1), (2,3)}
{(1,5), (2,7)}
{(1,1), (2,3)}
{(1,5), (2,7)}
{(1,4), (2,7)}
{(1,3), (2,1)}
{(1,4), (2,7)}
{(1,3), (2,1)}
User-defined ML pipeline Extracted DAG representation Materialised artifacts with their provenance
1 2 3
Grafberger, et al. Data distribution debugging in ML pipelines. VLDBJ (2022).
Data Journeys
- Importance of the entire pipeline
- Data Journey: a multi- layered,
semantic representation of a data
processing activity, linked to the
digital assets involved (code,
components, data).
- Can we provide a compact
representation?
Enrico Daga and Paul Groth. "Data journeys: Explaining AI work
fl
ows
through abstraction." Semantic Web 15.4 (2024): 1057-1083.
DAG representation of a Random Forests Python
Notebook
Example: Random Forests
Results
03.04.23 12
Initial user survey
Exploration to co-construction
Katharina J Rohl
fi
ng et al. 2020. Explanation as a social practice: toward a
conceptual framework for the social design of ai systems. IEEE Transactions on
Cognitive and Developmental Systems, 13, 3, 717–728.
What would it look like for this system?
18.03.25 15
Prototype
16
XAI tools and techniques are available
Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2021. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems.
ACM Trans. Interact. Intell. Syst. 11, 3–4, Article 24 (December 2021), 45 pages. https://doi.org/10.1145/3387166
XAI - Evaluation of Explanations
Argument Quality Assessment
▫ Metrics [1]:
▿ functionally-grounded - metrics that do require human feedback and measure
properties of the explanation (e.g. faithfulness - how accurately does the explanation
correspond to the thing being explained);
▿ human-grounded metrics - metrics that involve human participation either through
feedback, observation or proxy tasks (e.g. how interpretable is an explanation to an
end user);
▿ application-grounded - metrics that measure explanations through their usage in an
application (e.g. does the performance of the human-AI system improve on a
downstream task);
▫ Challenges:
▿ Interactivity
▿ Many personas
Evaluation of Explanations
[1] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for
explainable arti
fi
cial intelligence: a systematic survey of surveys on
methods and concepts. Data Mining and Knowledge Discovery,
38(5):3043–3101, 2024.
Virtual Personas
Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph
Suh, Widyadewi Soedarmadji, Eran Kohen Behar, and
David Chan. 2024. Virtual Personas for Language Models
via an Anthology of Backstories. In Proceedings of the 2024
Conference on Empirical Methods in Natural Language
Processing, pages 19864–19897, Miami, Florida, USA.
Association for Computational Linguistics.
Virtual Personas - Critique
Lindia Tjuatja, Valerie Chen, Tongshuang Wu,
Ameet Talwalkwar, Graham Neubig; Do LLMs
Exhibit Human-like Response Biases? A Case
Study in Survey Design. Transactions of the
Association for Computational Linguistics 2024;
12 1011–1026. doi: https://doi.org/10.1162/
tacl_a_00685
LLMs as judges
Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, and Shuai Ma. 2024.
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges. In Proceedings of
the 2024 Conference on Empirical Methods in Natural Language Processing, pages 16028–16045,
Miami, Florida, USA. Association for Computational Linguistics.
LLMs as judges
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang,
Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric
P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023.
Judging LLM-as-a-judge with MT-bench and Chatbot Arena. In
Proceedings of the 37th International Conference on Neural
Information Processing Systems (NIPS '23). Curran Associates Inc.,
Red Hook, NY, USA, Article 2020, 46595–46623.
https://github.com/JanKalo/enexa_explanation/ 18.03.25 25
1. Clarity & Structure: Does the explanation flow logically? Is it easy to follow?
2. Depth & Completeness: Does the explanation o
ff
er su
ffi
cient detail without
omitting crucial points?
3. Correctness & Fidelity: Are facts accurate, and does the explanation
remain faithful to the original query/context?
4. Relevance & Focus: Does the content stay on-topic and address user
queries directly?
5. Appropriateness for the Persona: Is the style/tone appropriate for the
user’s persona (e.g., an AI engineer, business strategist, etc.)?
6. Transparency: Does the explanation clarify its reasoning or highlight
uncertainties?
7. Engagement & Intuition: Is the conversation engaging, and does it address
the user’s interests intuitively?
18.03.25 26
Criteria
18.03.25 27
18.03.25 28
Results
Steps Forward
• A persona database
• Validation of LLM-as-a-judge
• Multi-modal provenance explanations
• Leverage detailed provenance information with visual and interactive outputs
• Interactive AI agents
• Develop agents that can illustrate, gesture to diagrams, and walk users
through data
fl
ow across AI system components in real-time
• Enhanced provenance systems:
• Implement retrieval augmented generation over provenance stores
• Integrate LLMs directly into provenance collection and preparation
processes
• Appropriate abstraction levels for user interaction
• Interactive explanation environments
Co-Constructing Explanations for AI Systems using Provenance
from https://www.dagstuhl.de/25051
Complex Use Cases & Common Evaluation Pitfalls
- Law
- Clinical Trials
- Science
- Missing or Incorrect Ground Truth Data
- Data Leakage
- Confirmation Bias
- Deployment mismatch
Desiderata for Evaluation Approaches
1. be able to evaluate such multifaceted and complex outputs
2. no ground truth is available
3. run in a continuous manner
4. cope with changes in outputs
5. efficiently make use of human effort
6. readily applied to new problems, tasks and domains with a
minimal amount of effort
7. cope with variation in the tailored outputs
De
fi
ning performance in terms of explanation quality
AI System performance:
Given a set of tasks, and corresponding outputs and their explanations created by
an AI system. AI System performance is the aggregation of the quality of the
explanations.
Claim:
By evaluating through explanations we cover the desiderata
on the prior slide
Design and Development
objective, task, and
context
Explanation
Dimension Selection
AI System
Development
Deployment
AI System Execution execution results
AI system
Explanation
Generation
execution traces
Evaluation
explanation
Explanation
Assessment
dimensions and
metrics
evaluation result
Conclusion
• AI systems build explanations together with users. It’s a process.
• We need better ways to evaluate such processes
• LLMs as proxy personas and LLMs as judges may allow for extensive
and reproducible evaluations of explanations.
• Explanation as evaluation for AI Systems
Paul Groth | @pgroth | pgroth.com | indelab.org

More Related Content

Similar to Co-Constructing Explanations for AI Systems using Provenance (20)

Explainable AI
Explainable AIExplainable AI
Explainable AI
Dinesh V
 
Explainable_artificial_intelligence_A_survey.pdf
Explainable_artificial_intelligence_A_survey.pdfExplainable_artificial_intelligence_A_survey.pdf
Explainable_artificial_intelligence_A_survey.pdf
fayazahmed944049
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
Eindhoven University of Technology / JADS
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 
RAPIDE
RAPIDERAPIDE
RAPIDE
Tessella
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
Debmalya Biswas
 
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
Arun Balajiee Lekshmi Narayanan
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Explainable AI.pptx
Explainable AI.pptxExplainable AI.pptx
Explainable AI.pptx
aagamshah0812
 
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Marlon Dumas
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
Detailed_Conference_Presentation_MultiAgentAI (1).pptx
Detailed_Conference_Presentation_MultiAgentAI (1).pptxDetailed_Conference_Presentation_MultiAgentAI (1).pptx
Detailed_Conference_Presentation_MultiAgentAI (1).pptx
yuktivarshney16
 
Bridge the Capabilities of AI with the Needs of Human Users
Bridge the Capabilities of AI with the Needs of Human UsersBridge the Capabilities of AI with the Needs of Human Users
Bridge the Capabilities of AI with the Needs of Human Users
Qianwen Wang
 
A.I.pptx
A.I.pptxA.I.pptx
A.I.pptx
VAISHALIBHARATHI2137
 
1. Introduction-to-Explainable-AI-XAI.pptx
1. Introduction-to-Explainable-AI-XAI.pptx1. Introduction-to-Explainable-AI-XAI.pptx
1. Introduction-to-Explainable-AI-XAI.pptx
KanavGupta76
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Explainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIExplainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AI
epsilon_tud
 
Improved Interpretability and Explainability of Deep Learning Models.pdf
Improved Interpretability and Explainability of Deep Learning Models.pdfImproved Interpretability and Explainability of Deep Learning Models.pdf
Improved Interpretability and Explainability of Deep Learning Models.pdf
Narinder Singh Punn
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2
Rune Sætre
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Dinesh V
 
Explainable_artificial_intelligence_A_survey.pdf
Explainable_artificial_intelligence_A_survey.pdfExplainable_artificial_intelligence_A_survey.pdf
Explainable_artificial_intelligence_A_survey.pdf
fayazahmed944049
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
Eindhoven University of Technology / JADS
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
Debmalya Biswas
 
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
CSEDM'24 _ Evaluating Correctness of Student Code Explanations_ Challenges an...
Arun Balajiee Lekshmi Narayanan
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Explanatory Capabilities of Large Language Models in Prescriptive Process Mon...
Marlon Dumas
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
Detailed_Conference_Presentation_MultiAgentAI (1).pptx
Detailed_Conference_Presentation_MultiAgentAI (1).pptxDetailed_Conference_Presentation_MultiAgentAI (1).pptx
Detailed_Conference_Presentation_MultiAgentAI (1).pptx
yuktivarshney16
 
Bridge the Capabilities of AI with the Needs of Human Users
Bridge the Capabilities of AI with the Needs of Human UsersBridge the Capabilities of AI with the Needs of Human Users
Bridge the Capabilities of AI with the Needs of Human Users
Qianwen Wang
 
1. Introduction-to-Explainable-AI-XAI.pptx
1. Introduction-to-Explainable-AI-XAI.pptx1. Introduction-to-Explainable-AI-XAI.pptx
1. Introduction-to-Explainable-AI-XAI.pptx
KanavGupta76
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Explainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AIExplainable AI is not yet Understandable AI
Explainable AI is not yet Understandable AI
epsilon_tud
 
Improved Interpretability and Explainability of Deep Learning Models.pdf
Improved Interpretability and Explainability of Deep Learning Models.pdfImproved Interpretability and Explainability of Deep Learning Models.pdf
Improved Interpretability and Explainability of Deep Learning Models.pdf
Narinder Singh Punn
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2
Rune Sætre
 

More from Paul Groth (20)

Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
Ad

Recently uploaded (20)

Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Soulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate reviewSoulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate review
Soulmaite
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training RoadblocksDown the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdfcnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 
FME Beyond Data Processing Creating A Dartboard Accuracy App
FME Beyond Data Processing Creating A Dartboard Accuracy AppFME Beyond Data Processing Creating A Dartboard Accuracy App
FME Beyond Data Processing Creating A Dartboard Accuracy App
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to KnowWhat is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
What is Oracle EPM A Guide to Oracle EPM Cloud Everything You Need to Know
SMACT Works
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Soulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate reviewSoulmaite review - Find Real AI soulmate review
Soulmaite review - Find Real AI soulmate review
Soulmaite
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training RoadblocksDown the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdfcnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 
FME Beyond Data Processing Creating A Dartboard Accuracy App
FME Beyond Data Processing Creating A Dartboard Accuracy AppFME Beyond Data Processing Creating A Dartboard Accuracy App
FME Beyond Data Processing Creating A Dartboard Accuracy App
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Ad

Co-Constructing Explanations for AI Systems using Provenance

  • 1. Co-Constructing Explanations for AI Systems using Provenance Prof. Paul Groth | @pgroth | pgroth.com | indelab.org Thanks to Jan-Christoph Kalo, Fina Polat, Shubha Guha, Enrico Daga XAI-KG Workshop - ESWC 2025
  • 3. “While the explanation process involves the assimilation of knowledge, it also transforms the learner’s knowledge and is thus an accommodative process. Thus, rather than referring to "explanations" (and assuming that the property of being an explanation is a property of statements), it might be prudent to refer to explaining, and regard explaining an active exploration process. ” 03.04.23 3 Explanation by Exploration or Self Explanation S.T. Mueller, R.R. Hoffman, W. Clancey, A. Emrey and G. Klein, Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI, arXiv, 2019. https://arxiv.org/abs/ 25 1902.01876
  • 4. “The process of explaining, and human-human communication broadly, is a co-adaptive ‘tuning’ process, which requires that the explainer and learner have a capacity to take each other's perspective.” 03.04.23 4 A collaborative and co-adaptive process S.T. Mueller, R.R. Hoffman, W. Clancey, A. Emrey and G. Klein, Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI, arXiv, 2019. https://arxiv.org/abs/ 25 1902.01876
  • 6. Trace-Based Explanation Shruthi Chari, Daniel M Gruen, Oshani Seneviratne, and Deborah L McGuinness. 2020. Directions for explainable knowledge-enabled systems. In Knowledge Graphs for eXplainable Arti fi cial Intelligence: Foundations, Applications and Challenges. IOS Press, 245–261.
  • 7. Provenance is all you need! def load_data(…): input1 = pd.read_csv(…) input1 = input1[input1[‘attr’] > 10] input2 = pd.read_csv(…) return input1.join(input2, …) def featurise(…): return ColumnTransformer( [(‘categorical’), …, …) (‘numerical’), …, …)]) all_data = load_(data) train = all_data[all_data[‘date’] < …] test = all_data[all_data[‘date’] >= …] y_train = label_binarize(train[‘label’]) y_test = label_binarize(test[‘label’]) Pipeline = Pipeline([ (‘features’, featurise(…)), (‘learner’, KerasClassifier(…))]) model = pipeline.fit(train, y_train) quality = model.score(test, y_test) input1 σattr>10 input2 ⋈id=fk_id 1-hot πattr σdate<… σdate>=… scale πval concat FitClassifier Score Xtrain ytrain D1 D2 Xtest ytest id attr label date fk_id val πlabel binarize 1-hot πattr scale concat πlabel binarize πval 0, 1, 0, …, 0.23 1, 0, 0, …, 0.11 [ ] 0, 0, 1, …, 0.46 0, 1, 0, …, 0.17 [ ] {(1,1)} … {(1,...)} {(2,1)} ... {(2,...)} 0 1 [ ] 1 0 [ ] {(1,1), (2,3)} {(1,5), (2,7)} {(1,1), (2,3)} {(1,5), (2,7)} {(1,4), (2,7)} {(1,3), (2,1)} {(1,4), (2,7)} {(1,3), (2,1)} User-defined ML pipeline Extracted DAG representation Materialised artifacts with their provenance 1 2 3 Grafberger, et al. Data distribution debugging in ML pipelines. VLDBJ (2022).
  • 8. Data Journeys - Importance of the entire pipeline - Data Journey: a multi- layered, semantic representation of a data processing activity, linked to the digital assets involved (code, components, data). - Can we provide a compact representation? Enrico Daga and Paul Groth. "Data journeys: Explaining AI work fl ows through abstraction." Semantic Web 15.4 (2024): 1057-1083.
  • 9. DAG representation of a Random Forests Python Notebook
  • 13. Exploration to co-construction Katharina J Rohl fi ng et al. 2020. Explanation as a social practice: toward a conceptual framework for the social design of ai systems. IEEE Transactions on Cognitive and Developmental Systems, 13, 3, 717–728.
  • 14. What would it look like for this system?
  • 16. 16
  • 17. XAI tools and techniques are available Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2021. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ACM Trans. Interact. Intell. Syst. 11, 3–4, Article 24 (December 2021), 45 pages. https://doi.org/10.1145/3387166
  • 18. XAI - Evaluation of Explanations
  • 20. ▫ Metrics [1]: ▿ functionally-grounded - metrics that do require human feedback and measure properties of the explanation (e.g. faithfulness - how accurately does the explanation correspond to the thing being explained); ▿ human-grounded metrics - metrics that involve human participation either through feedback, observation or proxy tasks (e.g. how interpretable is an explanation to an end user); ▿ application-grounded - metrics that measure explanations through their usage in an application (e.g. does the performance of the human-AI system improve on a downstream task); ▫ Challenges: ▿ Interactivity ▿ Many personas Evaluation of Explanations [1] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable arti fi cial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, 38(5):3043–3101, 2024.
  • 21. Virtual Personas Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, and David Chan. 2024. Virtual Personas for Language Models via an Anthology of Backstories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 19864–19897, Miami, Florida, USA. Association for Computational Linguistics.
  • 22. Virtual Personas - Critique Lindia Tjuatja, Valerie Chen, Tongshuang Wu, Ameet Talwalkwar, Graham Neubig; Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design. Transactions of the Association for Computational Linguistics 2024; 12 1011–1026. doi: https://doi.org/10.1162/ tacl_a_00685
  • 23. LLMs as judges Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, and Shuai Ma. 2024. Leveraging Large Language Models for NLG Evaluation: Advances and Challenges. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 16028–16045, Miami, Florida, USA. Association for Computational Linguistics.
  • 24. LLMs as judges Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-bench and Chatbot Arena. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 2020, 46595–46623.
  • 26. 1. Clarity & Structure: Does the explanation flow logically? Is it easy to follow? 2. Depth & Completeness: Does the explanation o ff er su ffi cient detail without omitting crucial points? 3. Correctness & Fidelity: Are facts accurate, and does the explanation remain faithful to the original query/context? 4. Relevance & Focus: Does the content stay on-topic and address user queries directly? 5. Appropriateness for the Persona: Is the style/tone appropriate for the user’s persona (e.g., an AI engineer, business strategist, etc.)? 6. Transparency: Does the explanation clarify its reasoning or highlight uncertainties? 7. Engagement & Intuition: Is the conversation engaging, and does it address the user’s interests intuitively? 18.03.25 26 Criteria
  • 29. Steps Forward • A persona database • Validation of LLM-as-a-judge • Multi-modal provenance explanations • Leverage detailed provenance information with visual and interactive outputs • Interactive AI agents • Develop agents that can illustrate, gesture to diagrams, and walk users through data fl ow across AI system components in real-time • Enhanced provenance systems: • Implement retrieval augmented generation over provenance stores • Integrate LLMs directly into provenance collection and preparation processes • Appropriate abstraction levels for user interaction • Interactive explanation environments
  • 32. Complex Use Cases & Common Evaluation Pitfalls - Law - Clinical Trials - Science - Missing or Incorrect Ground Truth Data - Data Leakage - Confirmation Bias - Deployment mismatch
  • 33. Desiderata for Evaluation Approaches 1. be able to evaluate such multifaceted and complex outputs 2. no ground truth is available 3. run in a continuous manner 4. cope with changes in outputs 5. efficiently make use of human effort 6. readily applied to new problems, tasks and domains with a minimal amount of effort 7. cope with variation in the tailored outputs
  • 34. De fi ning performance in terms of explanation quality AI System performance: Given a set of tasks, and corresponding outputs and their explanations created by an AI system. AI System performance is the aggregation of the quality of the explanations. Claim: By evaluating through explanations we cover the desiderata on the prior slide
  • 35. Design and Development objective, task, and context Explanation Dimension Selection AI System Development Deployment AI System Execution execution results AI system Explanation Generation execution traces Evaluation explanation Explanation Assessment dimensions and metrics evaluation result
  • 36. Conclusion • AI systems build explanations together with users. It’s a process. • We need better ways to evaluate such processes • LLMs as proxy personas and LLMs as judges may allow for extensive and reproducible evaluations of explanations. • Explanation as evaluation for AI Systems Paul Groth | @pgroth | pgroth.com | indelab.org