AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION

AI-ASSISTED METAMORPHIC
TESTING FOR DOMAIN-SPECIFIC
MODELLING AND SIMULATION
Computer Science Department
Universidad Autónoma de Madrid (Spain) http://miso.es
Juan de Lara
joint work with
Pablo Gómez-Abajo, Pablo C. Cañizares, Esther Guerra and Alberto Núñez

WHERE I COME FROM
Universidad Autónoma de Madrid
• Established in 1968
• North part of Madrid (campus Cantoblanco)
• One of the top universities in Spain
• >30000 students
• “Excellence” campus with CSIC (Spanish Research Council)
Computer Science and Telecom. Engineering
• Created in 1992
• 96 full time professors
• Joint diploma Comp.Sci.-Maths
2

MISO GROUP
Modelling and software engineering research group
5 professors, 4 research associates, 3 PhD students
Automate software construction
• Model-driven engineering
• Domain-specific languages
• AI – LLMs
• Testing
3
http://miso.es

TODAY’S TALK
How to test software that is hard-to-test because:
• Unclear oracles (exact expected output for a given input)
• Complex input data (e.g., model-like)
Examples
• Machine learning systems
• Autonomous driving systems
• Compilers
• Image/signal processing software
• Scientific computing applications
• Chess engines
• …
Often, simulators fall into this category
• Climate, physics engines, economic models, cloud simulators
4

METAMORPHIC TESTING (MT)
5
Imagine we need to test the software for a calculator
Let’s start by the sin function
in=31.5,
expected=0.5224
…
in=-1.2,
expected=−0.0209
System under test
(SUT) Test suite
Test suite = collection of test cases
Test case = input data + oracle
(expected outputs)

6
in=31.5
SUT
Input data
Test execution
out=0.5224
Oracle
Output
0.5224
in=31.5,
expected=0.5224
…
in=-1.2,
expected=−0.0209
Test suite
==
Test passes

How to build a test suite?
7
Test case input Test case oracle
SUT
31.5 function sine
-1.2
sin(31.5)==0.522498565 ?
sin(-1.2)==-0.0209... ?
How to include an input value that is not in the test suite?
If the mechanism that calculates this values (oracle) is not present, then we
are incurring in the oracle problem
Test case result
Oracle (exact expected output) vs partial oracle

8
How does MT alleviate this challenge?
• Rather than using specific values, MT is based on underlying knowledge of the SUT:
Use trigonometric properties of the sin function, like:
−sin(x) = sin(-x)
Test case input Test case oracle
SUT
77 function sin -sin (77)
==
sin (-77)
Test case result

9
This allows to easily generate test cases with oracles by:
• Taking any input x [source test case]
• Generate another input -x [follow up test case]
• Execute sine on x
• Execute sine on -x
• Compare results
The sine trigonometric property is a Metamorphic Relation (MR)
• Involves checking properties of two or more system inputs [x2 = -x1]
• And testing if the outputs are related as expected [-sin(x1) == sin(x2)]
• MR  [x2 == -x1] implies [-sin(x1) == sin(-x2)]

METAMORPHIC TESTING’S
ORACLE
10
The oracle uses multiple inputs
in1=77
SUT
Input data
out1=0.9744
Outputs
in2=-77 out2=-0.9744
Oracle
-out1 == out2?

HOW IS THIS RELATED TO
SIMULATION?
Image we’d like to test a cloud simulator
11
Cloud
model +
workload
input
Report
(exec. time)
output
What’s the oracle here?
Cloud
Simulator

SIMULATION?
Metamorphic
testing to the
rescue!
12
Cloud Model
+
workload
input1
Report
(exec. time)
output1
Cloud model
(less computing nodes)
+
same workload
input2
Report
(exec. time)
output2
This time should be lower or equal
Cloud Simulator

SIMULATION?
Create a set of MRs capturing expert knowledge in the simulators’ domain
Create a set of seed test cases
• Prototypical cloud models
Use the MRs to create follow-up test cases
• Use the MRs to derive cloud models that satisfy the MR’s pre-condition
Use the MRs as oracles between seeds and follow-ups
We can use metamorphic testing to benchmark simulators in a given field
• Do they pass all expected MRs?
13

HOWEVER…
Building an MT environment for a domain is hard
• Complex inputs: Parsing to describe metamorphic relations
• Outputs: Parsing to check metamorphic relations
• Metamorphic relations:
• Express them in an understandable way
• Flexibility to change and create new ones while testing
• Automated generation of follow-ups
• Statistics of the testing process
• Failure rate of every metamorphic relation
• Analysis of inputs making a metamorphic relation fail
• Capabilities to compare several SUTs (e.g., benchmark simulators)
14

OUR GOAL
Facilitate the construction of MT environments for various domains
• Model-driven engineering for this purpose
• Use meta-modelling and transformations
Practical framework
• Gotten: an Eclipse plugin to build MT environments
• Considers the comparison of alternative SUTs
• Domain-independent: applicable to systems in different domains
• Follow-up generation
• Domain Specific Language for MRs
15
Facilitate the MT process by using AI
• Conversational assistants based on LLMs
• Help in deriving MRs, explaining MRs, generating follow-ups

AGENDA
Model-driven engineering
• Cloud simulation as running example
The Gotten framework
• Benchmarking cloud simulators using MT
AI-based MT
• Assess its potential to help in the MT process
16

MODEL-DRIVEN ENGINEERING
Use models to automate the different phases of software development
• Specification
• Testing
• Verification
• Simulation
• Code generation
• …
Models described using:
• General purpose modelling languages (e.g., UML)
• Domain-specific languages
17

DOMAIN-SPECIFIC
LANGUAGES (DSLs)
Develop software with higher levels of quality and productivity
Increasing the level of abstraction: models
Fewer “accidental” details, notations, and language closer to the problem
Models are not just documentation
• They are used to generate code, they can be executed
• Specific and well-understood domains
18

DOMAIN-SPECIFIC
LANGUAGES: PARTS
Abstract syntax
• Concepts and primitives of the domain
• Described via a meta-model
Concrete syntax
• How the models are represented and interacted with
• Textual, graphical, tabular, etc.
Semantics
• What models mean
• Via transformations (to other models, to code,
rewriting/simulation)
19
val sim = new Simulation
// Power model for a host
val simplePowerModel = new PowerModel {
override def getPower(cpuUtil: Double): Double = {
100 + (cpuUtilization * 200)
}
}…

META-MODELS
AND MODELS
20
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Meta-model
• Primitives of the language, properties, relations
• Conceptual model of the domain
•  class diagram (classes, attributes, associations)
Model
• Instance of the meta-model
•  object diagram (objects, slots, links)
0..1
switch
*
nodes
:switch
Switch
name: String
ports: int
speed: double

FULLY PRECISE AT ALL TIMES:
OCL INTEGRITY CONSTRAINTS
21
Meta-model
• Integrity constraints, invariants
• Object Constraint Language (OCL)
«conforms to»
positivePorts inv:
self.ports > 0
enoughPorts inv:
self.ports>= self.nodes->size()
ComputeNode
name: String
cpuCores: int
ramGb: int
meta-model
0..1
switch
*
nodes
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
model
csc1: Switch
name=“sw1”
ports=1
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
:switch
Model
• Should satisfy each invariant
Switch
name: String
ports: int
speed: double

FULLY PRECISE AT ALL TIMES:
OCL INTEGRITY CONSTRAINTS
22
Meta-model
• Integrity constraints, invariants
• Object Constraint Language (OCL)
«conforms to»
ComputeNode
name: String
cpuCores: int
ramGb: int
meta-model
Switch
name: String
ports: int
speed: double
0..1
switch
*
nodes
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
:switch
Model
• Should satisfy each invariant
positivePorts inv:
self.ports > 0
enoughPorts inv:
self.ports>= self.nodes->size()

DSLs :
CONCRETE SYNTAX
23
(conditional style)
Graphical concrete syntax specification
Abstract syntax Concrete syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
«name»
«cpuCores» cores
«ramGb» Gb RAM
«ports» ports
«speed» G
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
48 ports
100 G
ports==nodes.size()
«name»
sw1

DSLs :
CONCRETE SYNTAX
24
(conditional style)
Graphical concrete syntax specification
Abstract syntax Concrete syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=2
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
«name»
«cpuCores» cores
«ramGb» Gb RAM
«ports» ports
«speed» G
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
ports==nodes.size()
«name»
sw1

DSLs :
CONCRETE SYNTAX
25
Textual syntax specification
Concrete syntax
Node “ML infer” {
cpuCores: 24
ram: 256
switch: sw1
}
Node “ML train” {
cpuCores: 32
ramGb: 512
}
ComputeNode returns ComputeNode:
‘Node’ name=String ‘{‘
‘cpuCores:’ cpuCores=Int
‘ram:’ ramGb=Int
(‘switch:’ switch=[Switch|String])?
‘}’
…
Abstract syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
Switch sw1 {
ports: 48
speed: 100
nodes: “ML infer”,
“ML train”
}

DSL-BASED AUTOMATION
SOLUTIONS
26
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
What can we do with such a model?

MODEL-TO-MODEL
TRANSFORMATION
27
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Create another model, possibly conforming to a different meta-model
• Use that other model for analysis or simulation
• Results can be transferred back to the original model (back-annotation)

IN-PLACE MODEL
TRANSFORMATION
28
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
task
time=0
Rewrite the model
• Simulation or animation
• Optimisation or refactoring

29
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
Simulator
task
time=0.1
IN-PLACE MODEL
TRANSFORMATION
Rewrite the model

30
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
Simulator
time=0.4
IN-PLACE MODEL
TRANSFORMATION
Rewrite the model

MODEL-TO-TEXT
TRANSFORMATION
31
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
Simulator
time=0.4
M2text
transformation
Text/Code
The following is the data-center physical model:
Node “ML infer” has 24 cores and 256 GB of RAM
Node “ML train” has 32 cores and 512 GB of RAM
All nodes are connected to switch sw1, with 2 ports
And speed of 100G
Generate text from the model
• Code generation
• Documentation, HTML pages
• Prompts for an LLM
• …

SOLUTIONS
32
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
Simulator
time=0.4
M2text
transformation
Text/Code
And speed of 100G
Conversational
assistant
Is there direct communication
between nodes ML infer
and ML train?
• Code generation
• …

SOLUTIONS
33
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
Simulator
time=0.4
M2text
transformation
Text/Code
And speed of 100G
Conversational
assistant
Yes, they are both connected to
switch sw1, which implies the
latency would be minimal
• Code generation
• …

MDE FOR METAMORPHIC TESTING
We can use MDE to facilitate the construction of MT environments
DSLs to describe the inputs
• Cloud models
Model-to-text transformations
• Generate input textual artefacts for the cloud simulator(s)
DSL for expressing the metamorphic relations
• Use OCL for expressing features of the input model
• Parse the SuT outputs to obtain output features
Use search-based model transformation for follow-up generation
34

MDE FOR METAMORPHIC
TESTING
35
Application
expert
Create
domain MMs
Define SuT(s)
execution
start
EMF
ext.
point
Domain
expert
Define MRs
Fine-tune
follow-up
generation
mrDSL
fowDSL
Create input
test cases
Generate
follow-ups
Metamorphic
testing
Define MRs
satisfactory
results?
end
Tester
[yes]
[no]
GOTTEN GOTTEN

A (SIMPLE) META-MODEL FOR
CLOUD SYSTEMS
36

A (SIMPLE) META-MODEL FOR
CLOUD SYSTEMS
37
«conforms to»

DSL FOR METAMORPHIC RELATIONS
38
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
39
• Meta-models of the SuT inputs
• Model variables

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
40
• Domain features of input models
• Expressed in OCL

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
41
• Features of execution outputs
• Extracted from the executions

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
42
Characterisation of alternative SUTs
(like different simulators)

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
43
Metamorphic relations:
• Can use input and output features
• Can use the model variables (m1, m2, w1, w2) defined above

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
44
MR1:
if the cloud m1 has more compute nodes than m2 and workloads are equal
then
the cloud m1 takes less time (or equal) to process the workload than m2

}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
}
45
MR2:
if the cloud m1 has better CPUs than m2 and workloads are equal
then
the cloud m1 consumes less energy (or equal) to process the workload than m2

System
under test 1
System
under test 1
PROCESSORS
Specify
• How to transform the input models to the SuT input
• How to run the SuT on specific inputs
• How to parse the output to obtain the output features (e.g., Energy, Time)
Done via extension points defined by Gotten
• Eclipse mechanism for extensibility
46
generate(...)
execute(...)
getFeatures(…)
Processor
«interface»
«implements»
System
under test 1

FOLLOW UPS
47
MR
Follow-up
Generation DSL
Transf. rules
Search config
MoMOT
Seed
model
Seed
model
Seed
model
Seed
model
Seed
model
Follow
ups
Martin Fleck, Javier Troya, Manuel Wimmer:
Search-Based Model Transformations with MOMoT. ICMT 2016: 79-87

FOLLOW-UPS: FOWDSL
48
// MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
// context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
followups for datacentre using MR1
with source folder = "/dcmodels"
and output folder = "/dcmodels"
NNodes ->
delete [1..10] Rack;
delete [1..10] Board;
decrease [1..10] Rack.numBoards;
decrease [1..10] Board.nodesPerBoard
maxSolutions 10
iterations 10
algorithms [Random, NSGAII, NSGAIII, eMOEA]

FOLLOW-UPS: FOWDSL
49
followups for datacentre using MR1
with source folder = "/dcmodels"
and output folder = "/dcmodels"
NNodes ->
decrease [1..4] Rack.numBoards keeping {Rack.numBoards > 0};
decrease [1..4] Board.nodesPerBoard keeping {Board.nodesPerBoard > 0}
maximize ( NNodes(m2) - NNodes(m1) )
maxSolutions 10
iterations 8
algorithms [Random, NSGAII, NSGAIII, eMOEA]

GOTTEN
51
Open source, freely available at
https://g0tten.github.io/home.html

ASSESSING THE APPROACH
52
RQ1: How effective is Gotten to specify MT environments?
• Is it better than specifying the environment by hand?
RQ2: How suitable are the environments built with Gotten?
• Can we perform MT effectively?
RQ3: Can Gotten be used to create MT environments for different domains?
• Videostreaming
• Automata

(RQ1) EFFECTIVENESS IN
SPECIFYING MT ENVIRONMENTS
53
Cloud simulation domain
• Cost of integrating 2 cloud simulators (Dissect, CloudSimStorage) within Gotten
• Baseline: ad-hoc MT testing tool called FwCloudMeT
FwCloudMeT
• Fixed set of MRs
• Native follow-up generation
Gotten
• Extensible set of MRs (DSL)
• Follow-ups via MoMOT
One order of magnitude less
code in Gotten

(RQ2) SUITABILITY OF GOTTEN
FOR MT
54
Case study: cloud simulation
Definition of MRs
• 6 MRs
Generating follow-ups
• 200 follow ups for the 6 MRs
• ~11 secs per follow-up
MT process
• 4 passing MRs for both simulators
• Both passing and failing test cases for 1 MR
• 1 failing MR for both simulators

MRs FOR CLOUD SIMULATION
55
better CPU

less energy
machine increase

increase ratio bigger
than energy ratio
better storage

less time

56
MRs FOR CLOUD SIMULATION
better network

less time
better memory

less time
shorter workload

less time

MR5 (“better memory implies less or equal processing time”)
• None of the two simulators satisfy this relation
• Limitation in the handling of the memory system
MR2 (“The proportional increase in machines should be greater than or equal to the
proportional increase in energy usage”)
• Some scenarios do not pass this expectation
• Some idle machines, inefficient scheduling
57
TESTING PROCESS

(RQ3) MORE CASES: TESTING
VIDEO STREAMING APIS
58
REST API
Video Streaming
API
Music Streaming
API
…
Youtube Vimeo
… Spotify Apple
Music
…
is a is a
implements implements implements implements
domains
processors
Organise APIs into domains
Design metamorphic relations
• Applicable across platforms
Design test cases
• Reusable across platforms
Test and compare different platforms

MORE CASES: TESTING VIDEO
STREAMING APIS
59

MORE CASES:
YOUTUBE AND VIMEO
60
videostream input Features {
context VideoAPITest def: IsSearch: Boolean = request.oclIsTypeOf(SearchVideo)
context VideoAPITest def: IsUpdate: Boolean = request.oclIsTypeOf(UpdateVideo)
context SearchVideo def: MaxResults: Int = maxResults
context SearchVideo def: SearchOrder: Int = orderType
}
output Features {
NVideos : Long
OutputVideoId: Long
OutputVideoTitle: String
}
//...
MR1 = [ (IsSearch(m1) and MaxResults(m1) >= MaxResults(m2)) implies (NVideos(m1) >= NVideos(m2))]
MR2 = [ (IsSearch(m1) and SearchOrder(m1) <> SearchOrder(m2)) implies (NVideos(m1) == NVideos(m2))]
MR3 = [ IsUpdate(m1) and m1 == m2 implies
(OutputVideoId(m1) <> OutputVideoId(m2)) and
(OutputVideoTitle(m1) == OutputVideoTitle(m2)) ]
}

TESTING PROCESS
Designed 30 test cases
• Automatically generated 120 follow-ups
All test-cases could be reused for Youtube and Vimeo
Results:
• All tests for MR1 and MR2 passed
• ~7% of tests for MR2 failed in each platform:
we obtained different number of videos for search queries with different ordering criteria
61

TESTING THE SIMULATOR OR THE
MODEL BEING SIMULATED?
We can build MRs to test either the simulator or the model
Lets consider MT for Determinitic Finite Automata (DFA)
• Consider the MR: (w’ == w.1) and Accept(dfa, w’) implies not Accept(dfa, w)
• We are testing the DFA model behaves according to our expectations
Instead, consider the MR:
• (FinalStates(dfa2) == States(dfa1)-FinalStates(dfa1))
implies Accept(dfa1, w1) != Accept(dfa2, w1)
(swapping final states yields complement language)
• Here we are testing a general property of DFAs that every simulator should fulfil
62

TESTING THE SIMULATOR OR
THE MODEL BEING SIMULATED?
63
Model
Input
MRs involving modifications of the model, usually test the
simulator
Simulator

TESTING THE SIMULATOR OR THE
MODEL BEING SIMULATED?
64
MRs involving modifications of the model’s input, usually test the
model
Model
Input
Simulator

AI FOR METAMORPHIC
TESTING
Generative AI
Large Language Models (LLMs)
• Produce suitable answers out of natural language text
Extensively used today in many domains
API-based integration in applications
• Agent-based programming
• LangGraph, AutoGen, CrewAI, …
65
Text
Text
LLM
LLM
task1 task2
…
shared state
(prompt)
prompt
Structured
output

AI FOR METAMORPHIC
TESTING
Use large language models to help the tester
Assistive tasks
• Create an MR out of natural language text
• Explain an MR in natural language
• Simplify an existing MR
• Combine existing MRs
• Derive new MRs
• Derive N follow-up test cases of a model for a MR
• Derive initial test cases
• …
66

AI FOR METAMORPHIC
TESTING: STRATEGY
Prompt engineering
• Explanation of Gotten
• Grammar of Gotten (Xtext)
• Meta-model of the domain
• Current model
• Current selection in tool
• Specific prompt depending on task
Agent workflow
• Some tasks solved with the LLM
• Safeguards: error checking and repair
67

AGENT WORKFLOW TO CREATE MR
68
Task
Classification
Create MR
out of NL
Syntax
checker
MR fixer
LLM
…
openAI
…
(all other tasks)
Gotten
IDE
prompt
Gotten
code
Errors
Gotten
code
compilation

AGENT WORKFLOW TO CREATE FOLLOW UPS
69
Task
Classification
Create N
followups
EMF
checker
MR
checker
LLM
…
openAI
…
(all other tasks)
Gotten
IDE
prompt
EMF
model
Errors
syntactic
repair
errors? errors?
semantic
repair
Error
y
n
y
more?
n
Generated
models
y
n

PRELIMINARY RESULTS FOR
FOLLOW UP GENERATION
70
MR
CPU(m1) > CPU(m2) and w1 == w2 implies …
NMachines(m1) > NMachines(m2) and w1 == w2 implies …
Storage(m1) > Storage(m2) and w1 == w2 implies…
Network(m1) > Network(m2) and w1 == w2 implies…
Memory(m1) > Memory(m2) and w1 == w2 implies…
N
Correct
(syn)
Correct
(sem)
Diff
Median
(ms)
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
3296
3351
3606
3448
3088
Task: Generate N follow-up models from seed model and MR
Model: gpt-4.1-mini, temperature=0.7
Seed model size: 6 objects [1 rack]

PRELIMINARY RESULTS FOR
FOLLOW UP GENERATION
71
MR
CPU(m1) > CPU(m2) and w1 == w2 implies …
NMachines(m1) > NMachines(m2) and w1 == w2 implies …
Storage(m1) > Storage(m2) and w1 == w2 implies…
Network(m1) > Network(m2) and w1 == w2 implies…
Memory(m1) > Memory(m2) and w1 == w2 implies…
N
Correct
(sem)
Median
(ms)
10
10
10
10
10
8
6
8
8
9
6246
7982
7788
6488
6611
Task: Generate N follow-up models from seed model and MR
Model: gpt-4.1-mini, temperature=0.7
Seed model size: 18 objects [4 racks]
First loop
Correct
(sem)
Median
(ms)
2
4
2
2
1
7889
5514
5569
6615
8946
Repair

COMMENTS… THE POSITIVE
Promising results
• All models syntactically correct at first generation
• All models semantically correct (are follow-ups) at first generation
Faster than using MoMOT
• Search-based transformation is very heavyweight
All follow-ups are different
• This was inserted in the prompt
• But all of them structurally equal (different attributes)
72

COMMENTS… THE CAVEATS
Caveats:
• Relatively simple MRs
• Simple seed models
Discussion
• Is this the right approach to follow-up generation?
• SAT solving/Search-based optimisation vs. Prompt + repair cycles
• Solid engineering vs. More fragile systems
• Measure structural diversity of solutions
• Handling constraints that could not be handled before (e.g., on Strings)?
73

CONCLUSIONS
Metamorphic testing helps assessing systems that are hard to test
• Simulators often fall in this category
Metamorphic relations
• Involve several inputs and their expected results
• Both as oracles and to generate follow-up test cases
Gotten framework helps creating metamorphic testing environments
• Based on MDE principles
• Examples for cloud simulation
AI for metamorphic testing
• Agentic workflows based on LLMs to help in several tasks
74

OUTLOOK
Metamorphic testing in other domains (involving simulation)
• If you have a case study, let’s talk!
Systematic assessment of AI assistance quality
• Evaluation of each task
Metamorphic testing to assess conversational agents
• User simulation + metamorphic rules
75

MORE ABOUT GOTTEN
1. GOTTEN: A model-driven solution to engineer domain-specific metamorphic testing
environments. Pablo Gómez-Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de
Lara. 2023. In ACM/IEEE 26th International Conference on Model Driven Engineering Languages
and Systems (MoDELS 2023), Västerås.
2. Automated engineering of domain-specific metamorphic testing environments. Pablo Gómez-
Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de Lara. 2023. In Information and
Software Technology (Elsevier).
3. New ideas: Automated engineering of metamorphic testing environments for domain-specific
languages. Pablo C. Cañizares, Pablo Gómez-Abajo, Alberto Núñez, Esther Guerra, Juan de
Lara. 2021. In ACM SIGPLAN International Conference on Software Language Engineering (SLE
2021), Chicago. Best new ideas/vision paper award at SLE’21
76

THANKS!
Questions?
Juan.deLara@uam.es
77
http://www.miso.es
modelling &
software engineering
research group

AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION

Recommended

More Related Content

Similar to AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION (20)

More from miso_uam (20)

Recently uploaded (20)

AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION