SlideShare a Scribd company logo
AI-ASSISTED METAMORPHIC
TESTING FOR DOMAIN-SPECIFIC
MODELLING AND SIMULATION
Computer Science Department
Universidad Autónoma de Madrid (Spain) http://miso.es
Juan de Lara
joint work with
Pablo Gómez-Abajo, Pablo C. Cañizares, Esther Guerra and Alberto Núñez
WHERE I COME FROM
Universidad Autónoma de Madrid
• Established in 1968
• North part of Madrid (campus Cantoblanco)
• One of the top universities in Spain
• >30000 students
• “Excellence” campus with CSIC (Spanish Research Council)
Computer Science and Telecom. Engineering
• Created in 1992
• 96 full time professors
• Joint diploma Comp.Sci.-Maths
2
MISO GROUP
Modelling and software engineering research group
5 professors, 4 research associates, 3 PhD students
Automate software construction
• Model-driven engineering
• Domain-specific languages
• AI – LLMs
• Testing
3
http://miso.es
TODAY’S TALK
How to test software that is hard-to-test because:
• Unclear oracles (exact expected output for a given input)
• Complex input data (e.g., model-like)
Examples
• Machine learning systems
• Autonomous driving systems
• Compilers
• Image/signal processing software
• Scientific computing applications
• Chess engines
• …
Often, simulators fall into this category
• Climate, physics engines, economic models, cloud simulators
4
METAMORPHIC TESTING (MT)
5
Imagine we need to test the software for a calculator
Let’s start by the sin function
in=31.5,
expected=0.5224
…
in=-1.2,
expected=−0.0209
System under test
(SUT) Test suite
Test suite = collection of test cases
Test case = input data + oracle
(expected outputs)
METAMORPHIC TESTING (MT)
6
in=31.5
SUT
Input data
Test execution
out=0.5224
Oracle
Output
0.5224
in=31.5,
expected=0.5224
…
in=-1.2,
expected=−0.0209
Test suite
==
Test passes
METAMORPHIC TESTING (MT)
How to build a test suite?
7
Test case input Test case oracle
SUT
31.5 function sine
-1.2
sin(31.5)==0.522498565 ?
sin(-1.2)==-0.0209... ?
How to include an input value that is not in the test suite?
If the mechanism that calculates this values (oracle) is not present, then we
are incurring in the oracle problem
Test case result
Oracle (exact expected output) vs partial oracle
METAMORPHIC TESTING (MT)
8
How does MT alleviate this challenge?
• Rather than using specific values, MT is based on underlying knowledge of the SUT:
Use trigonometric properties of the sin function, like:
−sin(x) = sin(-x)
Test case input Test case oracle
SUT
77 function sin -sin (77)
==
sin (-77)
Test case result
METAMORPHIC TESTING (MT)
9
This allows to easily generate test cases with oracles by:
• Taking any input x [source test case]
• Generate another input -x [follow up test case]
• Execute sine on x
• Execute sine on -x
• Compare results
The sine trigonometric property is a Metamorphic Relation (MR)
• Involves checking properties of two or more system inputs [x2 = -x1]
• And testing if the outputs are related as expected [-sin(x1) == sin(x2)]
• MR  [x2 == -x1] implies [-sin(x1) == sin(-x2)]
METAMORPHIC TESTING’S
ORACLE
10
The oracle uses multiple inputs
in1=77
SUT
Input data
out1=0.9744
Outputs
in2=-77 out2=-0.9744
Oracle
-out1 == out2?
HOW IS THIS RELATED TO
SIMULATION?
Image we’d like to test a cloud simulator
11
Cloud
model +
workload
input
Report
(exec. time)
output
What’s the oracle here?
Cloud
Simulator
HOW IS THIS RELATED TO
SIMULATION?
Metamorphic
testing to the
rescue!
12
Cloud Model
+
workload
input1
Report
(exec. time)
output1
Cloud model
(less computing nodes)
+
same workload
input2
Report
(exec. time)
output2
This time should be lower or equal
Cloud Simulator
HOW IS THIS RELATED TO
SIMULATION?
Create a set of MRs capturing expert knowledge in the simulators’ domain
Create a set of seed test cases
• Prototypical cloud models
Use the MRs to create follow-up test cases
• Use the MRs to derive cloud models that satisfy the MR’s pre-condition
Use the MRs as oracles between seeds and follow-ups
We can use metamorphic testing to benchmark simulators in a given field
• Do they pass all expected MRs?
13
HOWEVER…
Building an MT environment for a domain is hard
• Complex inputs: Parsing to describe metamorphic relations
• Outputs: Parsing to check metamorphic relations
• Metamorphic relations:
• Express them in an understandable way
• Flexibility to change and create new ones while testing
• Automated generation of follow-ups
• Statistics of the testing process
• Failure rate of every metamorphic relation
• Analysis of inputs making a metamorphic relation fail
• Capabilities to compare several SUTs (e.g., benchmark simulators)
14
OUR GOAL
Facilitate the construction of MT environments for various domains
• Model-driven engineering for this purpose
• Use meta-modelling and transformations
Practical framework
• Gotten: an Eclipse plugin to build MT environments
• Considers the comparison of alternative SUTs
• Domain-independent: applicable to systems in different domains
• Follow-up generation
• Domain Specific Language for MRs
15
Facilitate the MT process by using AI
• Conversational assistants based on LLMs
• Help in deriving MRs, explaining MRs, generating follow-ups
AGENDA
Model-driven engineering
• Cloud simulation as running example
The Gotten framework
• Benchmarking cloud simulators using MT
AI-based MT
• Assess its potential to help in the MT process
16
MODEL-DRIVEN ENGINEERING
Use models to automate the different phases of software development
• Specification
• Testing
• Verification
• Simulation
• Code generation
• …
Models described using:
• General purpose modelling languages (e.g., UML)
• Domain-specific languages
17
DOMAIN-SPECIFIC
LANGUAGES (DSLs)
Develop software with higher levels of quality and productivity
Increasing the level of abstraction: models
Fewer “accidental” details, notations, and language closer to the problem
Models are not just documentation
• They are used to generate code, they can be executed
• Specific and well-understood domains
18
DOMAIN-SPECIFIC
LANGUAGES: PARTS
Abstract syntax
• Concepts and primitives of the domain
• Described via a meta-model
Concrete syntax
• How the models are represented and interacted with
• Textual, graphical, tabular, etc.
Semantics
• What models mean
• Via transformations (to other models, to code,
rewriting/simulation)
19
val sim = new Simulation
// Power model for a host
val simplePowerModel = new PowerModel {
override def getPower(cpuUtil: Double): Double = {
100 + (cpuUtilization * 200)
}
}…
META-MODELS
AND MODELS
20
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Meta-model
• Primitives of the language, properties, relations
• Conceptual model of the domain
•  class diagram (classes, attributes, associations)
Model
• Instance of the meta-model
•  object diagram (objects, slots, links)
0..1
switch
*
nodes
:switch
Switch
name: String
ports: int
speed: double
FULLY PRECISE AT ALL TIMES:
OCL INTEGRITY CONSTRAINTS
21
Meta-model
• Integrity constraints, invariants
• Object Constraint Language (OCL)
«conforms to»
positivePorts inv:
self.ports > 0
enoughPorts inv:
self.ports>= self.nodes->size()
ComputeNode
name: String
cpuCores: int
ramGb: int
meta-model
0..1
switch
*
nodes
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
model
csc1: Switch
name=“sw1”
ports=1
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
:switch
Model
• Should satisfy each invariant
Switch
name: String
ports: int
speed: double
FULLY PRECISE AT ALL TIMES:
OCL INTEGRITY CONSTRAINTS
22
Meta-model
• Integrity constraints, invariants
• Object Constraint Language (OCL)
«conforms to»
ComputeNode
name: String
cpuCores: int
ramGb: int
meta-model
Switch
name: String
ports: int
speed: double
0..1
switch
*
nodes
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
:switch
Model
• Should satisfy each invariant
positivePorts inv:
self.ports > 0
enoughPorts inv:
self.ports>= self.nodes->size()
DSLs :
CONCRETE SYNTAX
23
(conditional style)
Graphical concrete syntax specification
Abstract syntax Concrete syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
«name»
«cpuCores» cores
«ramGb» Gb RAM
«ports» ports
«speed» G
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
48 ports
100 G
ports==nodes.size()
«name»
sw1
DSLs :
CONCRETE SYNTAX
24
(conditional style)
Graphical concrete syntax specification
Abstract syntax Concrete syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=2
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
«name»
«cpuCores» cores
«ramGb» Gb RAM
«ports» ports
«speed» G
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
ports==nodes.size()
«name»
sw1
DSLs :
CONCRETE SYNTAX
25
Textual syntax specification
Concrete syntax
Node “ML infer” {
cpuCores: 24
ram: 256
switch: sw1
}
Node “ML train” {
cpuCores: 32
ramGb: 512
}
ComputeNode returns ComputeNode:
‘Node’ name=String ‘{‘
‘cpuCores:’ cpuCores=Int
‘ram:’ ramGb=Int
(‘switch:’ switch=[Switch|String])?
‘}’
…
Abstract syntax
ComputeNode
name: String
cpuCores: int
ramGb: int
n1: ComputeNode
name=“ML infer”
cpuCores= 24
ramGb=256
meta-model
model
csc1: Switch
name=“sw1”
ports=48
speed= 100
n2: ComputeNode
name=“ML train”
cpuCores= 32
ramGb=512
:switch
:nodes
:nodes
«conforms to»
Switch
ports: int
speed: double
0..1
switch
*
nodes
:switch
Switch sw1 {
ports: 48
speed: 100
nodes: “ML infer”,
“ML train”
}
DSL-BASED AUTOMATION
SOLUTIONS
26
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
What can we do with such a model?
MODEL-TO-MODEL
TRANSFORMATION
27
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Create another model, possibly conforming to a different meta-model
• Use that other model for analysis or simulation
• Results can be transferred back to the original model (back-annotation)
IN-PLACE MODEL
TRANSFORMATION
28
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
task
time=0
Rewrite the model
• Simulation or animation
• Optimisation or refactoring
29
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
task
time=0.1
IN-PLACE MODEL
TRANSFORMATION
Rewrite the model
• Simulation or animation
• Optimisation or refactoring
30
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
time=0.4
IN-PLACE MODEL
TRANSFORMATION
Rewrite the model
• Simulation or animation
• Optimisation or refactoring
MODEL-TO-TEXT
TRANSFORMATION
31
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
time=0.4
M2text
transformation
Text/Code
The following is the data-center physical model:
Node “ML infer” has 24 cores and 256 GB of RAM
Node “ML train” has 32 cores and 512 GB of RAM
All nodes are connected to switch sw1, with 2 ports
And speed of 100G
Generate text from the model
• Code generation
• Documentation, HTML pages
• Prompts for an LLM
• …
DSL-BASED AUTOMATION
SOLUTIONS
32
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
time=0.4
M2text
transformation
Text/Code
The following is the data-center physical model:
Node “ML infer” has 24 cores and 256 GB of RAM
Node “ML train” has 32 cores and 512 GB of RAM
All nodes are connected to switch sw1, with 2 ports
And speed of 100G
Conversational
assistant
Is there direct communication
between nodes ML infer
and ML train?
Generate text from the model
• Code generation
• Documentation, HTML pages
• Prompts for an LLM
• …
DSL-BASED AUTOMATION
SOLUTIONS
33
Model (Time Petri Nets)
sw1
ML
infer
[0, 0.1]
M2M
transformation
Petri nets tool
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
[1, 10]
ML
train [0, 0.1]
[1, 10]
Fwd
train
send
infer
send
train
Fwd
infer
next train
next infer infer-in
train-in
Model
ML infer
24 cores
256 Gb RAM
ML train
32 cores
512 Gb RAM
2 ports
100 G
sw1
M2M transformation (in-place)
Simulator
time=0.4
M2text
transformation
Text/Code
The following is the data-center physical model:
Node “ML infer” has 24 cores and 256 GB of RAM
Node “ML train” has 32 cores and 512 GB of RAM
All nodes are connected to switch sw1, with 2 ports
And speed of 100G
Conversational
assistant
Yes, they are both connected to
switch sw1, which implies the
latency would be minimal
Generate text from the model
• Code generation
• Documentation, HTML pages
• Prompts for an LLM
• …
MDE FOR METAMORPHIC TESTING
We can use MDE to facilitate the construction of MT environments
DSLs to describe the inputs
• Cloud models
Model-to-text transformations
• Generate input textual artefacts for the cloud simulator(s)
DSL for expressing the metamorphic relations
• Use OCL for expressing features of the input model
• Parse the SuT outputs to obtain output features
Use search-based model transformation for follow-up generation
34
MDE FOR METAMORPHIC
TESTING
35
Application
expert
Create
domain MMs
Define SuT(s)
execution
start
EMF
ext.
point
Domain
expert
Define MRs
Fine-tune
follow-up
generation
mrDSL
fowDSL
Create input
test cases
Generate
follow-ups
Metamorphic
testing
Define MRs
satisfactory
results?
end
Tester
[yes]
[no]
GOTTEN GOTTEN
A (SIMPLE) META-MODEL FOR
CLOUD SYSTEMS
36
A (SIMPLE) META-MODEL FOR
CLOUD SYSTEMS
37
«conforms to»
DSL FOR METAMORPHIC RELATIONS
38
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
39
• Meta-models of the SuT inputs
• Model variables
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
40
• Domain features of input models
• Expressed in OCL
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
41
• Features of execution outputs
• Extracted from the executions
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
42
Characterisation of alternative SUTs
(like different simulators)
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
43
Metamorphic relations:
• Can use input and output features
• Can use the model variables (m1, m2, w1, w2) defined above
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
44
MR1:
if the cloud m1 has more compute nodes than m2 and workloads are equal
then
the cloud m1 takes less time (or equal) to process the workload than m2
metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2
models "/sample.gotten/model/dcmodels"
metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2
models "/sample.gotten/model/workloads"
datacentre input Features {
context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
context DataCentre def: CPU: Int = racks->collect(
numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum()
}
output Features {
Time: Long
Energy: Long
}
Processor {
Name: String
Version: String
}
MetamorphicRelations {
MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ]
}
DSL FOR METAMORPHIC RELATIONS
45
MR2:
if the cloud m1 has better CPUs than m2 and workloads are equal
then
the cloud m1 consumes less energy (or equal) to process the workload than m2
System
under test 1
System
under test 1
PROCESSORS
Specify
• How to transform the input models to the SuT input
• How to run the SuT on specific inputs
• How to parse the output to obtain the output features (e.g., Energy, Time)
Done via extension points defined by Gotten
• Eclipse mechanism for extensibility
46
generate(...)
execute(...)
getFeatures(…)
Processor
«interface»
«implements»
System
under test 1
FOLLOW UPS
47
MR
Follow-up
Generation DSL
Transf. rules
Search config
MoMOT
Seed
model
Seed
model
Seed
model
Seed
model
Seed
model
Follow
ups
Martin Fleck, Javier Troya, Manuel Wimmer:
Search-Based Model Transformations with MOMoT. ICMT 2016: 79-87
FOLLOW-UPS: FOWDSL
48
// MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ]
// context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum()
followups for datacentre using MR1
with source folder = "/dcmodels"
and output folder = "/dcmodels"
NNodes ->
delete [1..10] Rack;
delete [1..10] Board;
decrease [1..10] Rack.numBoards;
decrease [1..10] Board.nodesPerBoard
maxSolutions 10
iterations 10
algorithms [Random, NSGAII, NSGAIII, eMOEA]
FOLLOW-UPS: FOWDSL
49
followups for datacentre using MR1
with source folder = "/dcmodels"
and output folder = "/dcmodels"
NNodes ->
decrease [1..4] Rack.numBoards keeping {Rack.numBoards > 0};
decrease [1..4] Board.nodesPerBoard keeping {Board.nodesPerBoard > 0}
maximize ( NNodes(m2) - NNodes(m1) )
maxSolutions 10
iterations 8
algorithms [Random, NSGAII, NSGAIII, eMOEA]
FOLLOW UPS
50
GOTTEN
51
Open source, freely available at
https://g0tten.github.io/home.html
ASSESSING THE APPROACH
52
RQ1: How effective is Gotten to specify MT environments?
• Is it better than specifying the environment by hand?
RQ2: How suitable are the environments built with Gotten?
• Can we perform MT effectively?
RQ3: Can Gotten be used to create MT environments for different domains?
• Videostreaming
• Automata
(RQ1) EFFECTIVENESS IN
SPECIFYING MT ENVIRONMENTS
53
Cloud simulation domain
• Cost of integrating 2 cloud simulators (Dissect, CloudSimStorage) within Gotten
• Baseline: ad-hoc MT testing tool called FwCloudMeT
FwCloudMeT
• Fixed set of MRs
• Native follow-up generation
Gotten
• Extensible set of MRs (DSL)
• Follow-ups via MoMOT
One order of magnitude less
code in Gotten
(RQ2) SUITABILITY OF GOTTEN
FOR MT
54
Case study: cloud simulation
Definition of MRs
• 6 MRs
Generating follow-ups
• 200 follow ups for the 6 MRs
• ~11 secs per follow-up
MT process
• 4 passing MRs for both simulators
• Both passing and failing test cases for 1 MR
• 1 failing MR for both simulators
MRs FOR CLOUD SIMULATION
55
better CPU

less energy
machine increase

increase ratio bigger
than energy ratio
better storage

less time
56
MRs FOR CLOUD SIMULATION
better network

less time
better memory

less time
shorter workload

less time
MR5 (“better memory implies less or equal processing time”)
• None of the two simulators satisfy this relation
• Limitation in the handling of the memory system
MR2 (“The proportional increase in machines should be greater than or equal to the
proportional increase in energy usage”)
• Some scenarios do not pass this expectation
• Some idle machines, inefficient scheduling
57
TESTING PROCESS
(RQ3) MORE CASES: TESTING
VIDEO STREAMING APIS
58
REST API
Video Streaming
API
Music Streaming
API
…
Youtube Vimeo
… Spotify Apple
Music
…
is a is a
implements implements implements implements
domains
processors
Organise APIs into domains
Design metamorphic relations
• Applicable across platforms
Design test cases
• Reusable across platforms
Test and compare different platforms
MORE CASES: TESTING VIDEO
STREAMING APIS
59
MORE CASES:
YOUTUBE AND VIMEO
60
videostream input Features {
context VideoAPITest def: IsSearch: Boolean = request.oclIsTypeOf(SearchVideo)
context VideoAPITest def: IsUpdate: Boolean = request.oclIsTypeOf(UpdateVideo)
context SearchVideo def: MaxResults: Int = maxResults
context SearchVideo def: SearchOrder: Int = orderType
}
output Features {
NVideos : Long
OutputVideoId: Long
OutputVideoTitle: String
}
//...
MetamorphicRelations {
MR1 = [ (IsSearch(m1) and MaxResults(m1) >= MaxResults(m2)) implies (NVideos(m1) >= NVideos(m2))]
MR2 = [ (IsSearch(m1) and SearchOrder(m1) <> SearchOrder(m2)) implies (NVideos(m1) == NVideos(m2))]
MR3 = [ IsUpdate(m1) and m1 == m2 implies
(OutputVideoId(m1) <> OutputVideoId(m2)) and
(OutputVideoTitle(m1) == OutputVideoTitle(m2)) ]
}
TESTING PROCESS
Designed 30 test cases
• Automatically generated 120 follow-ups
All test-cases could be reused for Youtube and Vimeo
Results:
• All tests for MR1 and MR2 passed
• ~7% of tests for MR2 failed in each platform:
we obtained different number of videos for search queries with different ordering criteria
61
TESTING THE SIMULATOR OR THE
MODEL BEING SIMULATED?
We can build MRs to test either the simulator or the model
Lets consider MT for Determinitic Finite Automata (DFA)
• Consider the MR: (w’ == w.1) and Accept(dfa, w’) implies not Accept(dfa, w)
• We are testing the DFA model behaves according to our expectations
Instead, consider the MR:
• (FinalStates(dfa2) == States(dfa1)-FinalStates(dfa1))
implies Accept(dfa1, w1) != Accept(dfa2, w1)
(swapping final states yields complement language)
• Here we are testing a general property of DFAs that every simulator should fulfil
62
TESTING THE SIMULATOR OR
THE MODEL BEING SIMULATED?
63
Model
Input
MRs involving modifications of the model, usually test the
simulator
Simulator
TESTING THE SIMULATOR OR THE
MODEL BEING SIMULATED?
64
MRs involving modifications of the model’s input, usually test the
model
Model
Input
Simulator
AI FOR METAMORPHIC
TESTING
Generative AI
Large Language Models (LLMs)
• Produce suitable answers out of natural language text
Extensively used today in many domains
API-based integration in applications
• Agent-based programming
• LangGraph, AutoGen, CrewAI, …
65
Text
Text
LLM
LLM
task1 task2
…
shared state
(prompt)
prompt
Structured
output
AI FOR METAMORPHIC
TESTING
Use large language models to help the tester
Assistive tasks
• Create an MR out of natural language text
• Explain an MR in natural language
• Simplify an existing MR
• Combine existing MRs
• Derive new MRs
• Derive N follow-up test cases of a model for a MR
• Derive initial test cases
• …
66
AI FOR METAMORPHIC
TESTING: STRATEGY
Prompt engineering
• Explanation of Gotten
• Grammar of Gotten (Xtext)
• Meta-model of the domain
• Current model
• Current selection in tool
• Specific prompt depending on task
Agent workflow
• Some tasks solved with the LLM
• Safeguards: error checking and repair
67
AGENT WORKFLOW TO CREATE MR
68
Task
Classification
Create MR
out of NL
Syntax
checker
MR fixer
LLM
…
openAI
…
(all other tasks)
Gotten
IDE
prompt
Gotten
code
Errors
Gotten
code
compilation
AGENT WORKFLOW TO CREATE FOLLOW UPS
69
Task
Classification
Create N
followups
EMF
checker
MR
checker
LLM
…
openAI
…
(all other tasks)
Gotten
IDE
prompt
EMF
model
Errors
syntactic
repair
errors? errors?
semantic
repair
Error
y
n
y
more?
n
Generated
models
y
n
PRELIMINARY RESULTS FOR
FOLLOW UP GENERATION
70
MR
CPU(m1) > CPU(m2) and w1 == w2 implies …
NMachines(m1) > NMachines(m2) and w1 == w2 implies …
Storage(m1) > Storage(m2) and w1 == w2 implies…
Network(m1) > Network(m2) and w1 == w2 implies…
Memory(m1) > Memory(m2) and w1 == w2 implies…
N
Correct
(syn)
Correct
(sem)
Diff
Median
(ms)
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
3296
3351
3606
3448
3088
Task: Generate N follow-up models from seed model and MR
Model: gpt-4.1-mini, temperature=0.7
Seed model size: 6 objects [1 rack]
PRELIMINARY RESULTS FOR
FOLLOW UP GENERATION
71
MR
CPU(m1) > CPU(m2) and w1 == w2 implies …
NMachines(m1) > NMachines(m2) and w1 == w2 implies …
Storage(m1) > Storage(m2) and w1 == w2 implies…
Network(m1) > Network(m2) and w1 == w2 implies…
Memory(m1) > Memory(m2) and w1 == w2 implies…
N
Correct
(sem)
Median
(ms)
10
10
10
10
10
8
6
8
8
9
6246
7982
7788
6488
6611
Task: Generate N follow-up models from seed model and MR
Model: gpt-4.1-mini, temperature=0.7
Seed model size: 18 objects [4 racks]
First loop
Correct
(sem)
Median
(ms)
2
4
2
2
1
7889
5514
5569
6615
8946
Repair
COMMENTS… THE POSITIVE
Promising results
• All models syntactically correct at first generation
• All models semantically correct (are follow-ups) at first generation
Faster than using MoMOT
• Search-based transformation is very heavyweight
All follow-ups are different
• This was inserted in the prompt
• But all of them structurally equal (different attributes)
72
COMMENTS… THE CAVEATS
Caveats:
• Relatively simple MRs
• Simple seed models
Discussion
• Is this the right approach to follow-up generation?
• SAT solving/Search-based optimisation vs. Prompt + repair cycles
• Solid engineering vs. More fragile systems
• Measure structural diversity of solutions
• Handling constraints that could not be handled before (e.g., on Strings)?
73
CONCLUSIONS
Metamorphic testing helps assessing systems that are hard to test
• Simulators often fall in this category
Metamorphic relations
• Involve several inputs and their expected results
• Both as oracles and to generate follow-up test cases
Gotten framework helps creating metamorphic testing environments
• Based on MDE principles
• Examples for cloud simulation
AI for metamorphic testing
• Agentic workflows based on LLMs to help in several tasks
74
OUTLOOK
Metamorphic testing in other domains (involving simulation)
• If you have a case study, let’s talk!
Systematic assessment of AI assistance quality
• Evaluation of each task
Metamorphic testing to assess conversational agents
• User simulation + metamorphic rules
75
MORE ABOUT GOTTEN
1. GOTTEN: A model-driven solution to engineer domain-specific metamorphic testing
environments. Pablo Gómez-Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de
Lara. 2023. In ACM/IEEE 26th International Conference on Model Driven Engineering Languages
and Systems (MoDELS 2023), Västerås.
2. Automated engineering of domain-specific metamorphic testing environments. Pablo Gómez-
Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de Lara. 2023. In Information and
Software Technology (Elsevier).
3. New ideas: Automated engineering of metamorphic testing environments for domain-specific
languages. Pablo C. Cañizares, Pablo Gómez-Abajo, Alberto Núñez, Esther Guerra, Juan de
Lara. 2021. In ACM SIGPLAN International Conference on Software Language Engineering (SLE
2021), Chicago. Best new ideas/vision paper award at SLE’21
76
https://g0tten.github.io/home.html
THANKS!
Questions?
Juan.deLara@uam.es
77
http://www.miso.es
modelling &
software engineering
research group
https://g0tten.github.io/home.html

More Related Content

Similar to AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION (20)

The power of AI and ML in Testing .
The power of AI and ML in Testing       .The power of AI and ML in Testing       .
The power of AI and ML in Testing .
tisnatom
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
Agile Testing Alliance
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Tao Xie
 
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Heemeng Foo
 
ReusingMT
ReusingMTReusingMT
ReusingMT
miso_uam
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
Arithmer Inc.
 
Model-based Testing Principles
Model-based Testing PrinciplesModel-based Testing Principles
Model-based Testing Principles
Henry Muccini
 
MODELS2013_MDHPCL_Presentation
MODELS2013_MDHPCL_PresentationMODELS2013_MDHPCL_Presentation
MODELS2013_MDHPCL_Presentation
Dionny Santiago
 
How Machine learning Integration supports testing automation in software
How Machine learning Integration supports testing automation in softwareHow Machine learning Integration supports testing automation in software
How Machine learning Integration supports testing automation in software
hamzaaftab25
 
Metamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptxMetamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptx
entertainmentweekly11
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Tao Xie
 
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
Jim Jimenez
 
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATIONTRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
ijseajournal
 
Integrating AI in software quality in absence of a well-defined requirements
Integrating AI in software quality in absence of a well-defined requirementsIntegrating AI in software quality in absence of a well-defined requirements
Integrating AI in software quality in absence of a well-defined requirements
Nagarro
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
Lionel Briand
 
Meta-modeling: concepts, tools and applications
Meta-modeling: concepts, tools and applicationsMeta-modeling: concepts, tools and applications
Meta-modeling: concepts, tools and applications
Saïd Assar
 
Introduction to architectures based on models, models and metamodels. model d...
Introduction to architectures based on models, models and metamodels. model d...Introduction to architectures based on models, models and metamodels. model d...
Introduction to architectures based on models, models and metamodels. model d...
Vicente García Díaz
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand
 
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
AIRCC Publishing Corporation
 
The power of AI and ML in Testing .
The power of AI and ML in Testing       .The power of AI and ML in Testing       .
The power of AI and ML in Testing .
tisnatom
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
#ATAGTR2019 Presentation "Assuring Quality for AI based applications" By Vino...
Agile Testing Alliance
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Tao Xie
 
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Heemeng Foo
 
Model-based Testing Principles
Model-based Testing PrinciplesModel-based Testing Principles
Model-based Testing Principles
Henry Muccini
 
MODELS2013_MDHPCL_Presentation
MODELS2013_MDHPCL_PresentationMODELS2013_MDHPCL_Presentation
MODELS2013_MDHPCL_Presentation
Dionny Santiago
 
How Machine learning Integration supports testing automation in software
How Machine learning Integration supports testing automation in softwareHow Machine learning Integration supports testing automation in software
How Machine learning Integration supports testing automation in software
hamzaaftab25
 
Metamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptxMetamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptx
entertainmentweekly11
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Tao Xie
 
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D...
Jim Jimenez
 
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATIONTRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
TRANSFORMING SOFTWARE REQUIREMENTS INTO TEST CASES VIA MODEL TRANSFORMATION
ijseajournal
 
Integrating AI in software quality in absence of a well-defined requirements
Integrating AI in software quality in absence of a well-defined requirementsIntegrating AI in software quality in absence of a well-defined requirements
Integrating AI in software quality in absence of a well-defined requirements
Nagarro
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
Lionel Briand
 
Meta-modeling: concepts, tools and applications
Meta-modeling: concepts, tools and applicationsMeta-modeling: concepts, tools and applications
Meta-modeling: concepts, tools and applications
Saïd Assar
 
Introduction to architectures based on models, models and metamodels. model d...
Introduction to architectures based on models, models and metamodels. model d...Introduction to architectures based on models, models and metamodels. model d...
Introduction to architectures based on models, models and metamodels. model d...
Vicente García Díaz
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand
 
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
AIRCC Publishing Corporation
 

More from miso_uam (20)

Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
Model-driven engineering for AR
Model-driven engineering for ARModel-driven engineering for AR
Model-driven engineering for AR
miso_uam
 
Capone.pdf
Capone.pdfCapone.pdf
Capone.pdf
miso_uam
 
MLE_keynote.pdf
MLE_keynote.pdfMLE_keynote.pdf
MLE_keynote.pdf
miso_uam
 
Multi21
Multi21Multi21
Multi21
miso_uam
 
MLMPLs
MLMPLsMLMPLs
MLMPLs
miso_uam
 
Scientific writing
Scientific writingScientific writing
Scientific writing
miso_uam
 
Facets_UCM
Facets_UCMFacets_UCM
Facets_UCM
miso_uam
 
SLE_MIP08
SLE_MIP08SLE_MIP08
SLE_MIP08
miso_uam
 
mtATL
mtATLmtATL
mtATL
miso_uam
 
Máster en Métodos Formales en Ingeniería Informática
Máster en Métodos Formales en Ingeniería InformáticaMáster en Métodos Formales en Ingeniería Informática
Máster en Métodos Formales en Ingeniería Informática
miso_uam
 
Analysing-MMPLs
Analysing-MMPLsAnalysing-MMPLs
Analysing-MMPLs
miso_uam
 
Facets
FacetsFacets
Facets
miso_uam
 
kite
kitekite
kite
miso_uam
 
MTPLs
MTPLsMTPLs
MTPLs
miso_uam
 
Miso-McGill
Miso-McGillMiso-McGill
Miso-McGill
miso_uam
 
Model Transformation Reuse
Model Transformation ReuseModel Transformation Reuse
Model Transformation Reuse
miso_uam
 
DSLcomet
DSLcometDSLcomet
DSLcomet
miso_uam
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experiments
miso_uam
 
keynote modelsward 2017
keynote modelsward 2017keynote modelsward 2017
keynote modelsward 2017
miso_uam
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
Model-driven engineering for AR
Model-driven engineering for ARModel-driven engineering for AR
Model-driven engineering for AR
miso_uam
 
Capone.pdf
Capone.pdfCapone.pdf
Capone.pdf
miso_uam
 
MLE_keynote.pdf
MLE_keynote.pdfMLE_keynote.pdf
MLE_keynote.pdf
miso_uam
 
Scientific writing
Scientific writingScientific writing
Scientific writing
miso_uam
 
Facets_UCM
Facets_UCMFacets_UCM
Facets_UCM
miso_uam
 
Máster en Métodos Formales en Ingeniería Informática
Máster en Métodos Formales en Ingeniería InformáticaMáster en Métodos Formales en Ingeniería Informática
Máster en Métodos Formales en Ingeniería Informática
miso_uam
 
Analysing-MMPLs
Analysing-MMPLsAnalysing-MMPLs
Analysing-MMPLs
miso_uam
 
Miso-McGill
Miso-McGillMiso-McGill
Miso-McGill
miso_uam
 
Model Transformation Reuse
Model Transformation ReuseModel Transformation Reuse
Model Transformation Reuse
miso_uam
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experiments
miso_uam
 
keynote modelsward 2017
keynote modelsward 2017keynote modelsward 2017
keynote modelsward 2017
miso_uam
 
Ad

Recently uploaded (20)

The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
14 Years of Developing nCine - An Open Source 2D Game Framework
14 Years of Developing nCine - An Open Source 2D Game Framework14 Years of Developing nCine - An Open Source 2D Game Framework
14 Years of Developing nCine - An Open Source 2D Game Framework
Angelo Theodorou
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptxIMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
Rebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core FoundationRebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core Foundation
Cadabra Studio
 
Leveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer IntentsLeveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer Intents
Keheliya Gallaba
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
BradBedford3
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-OffMicro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Tier1 app
 
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
WSO2
 
Best Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small BusinessesBest Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small Businesses
TheTelephony
 
Agile Software Engineering Methodologies
Agile Software Engineering MethodologiesAgile Software Engineering Methodologies
Agile Software Engineering Methodologies
Gaurav Sharma
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Prachi Desai
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...The rise of e-commerce has redefined how retailers operate—and reconciliation...
The rise of e-commerce has redefined how retailers operate—and reconciliation...
Prachi Desai
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
14 Years of Developing nCine - An Open Source 2D Game Framework
14 Years of Developing nCine - An Open Source 2D Game Framework14 Years of Developing nCine - An Open Source 2D Game Framework
14 Years of Developing nCine - An Open Source 2D Game Framework
Angelo Theodorou
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptxIMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdfHow to purchase, license and subscribe to Microsoft Azure_PDF.pdf
How to purchase, license and subscribe to Microsoft Azure_PDF.pdf
victordsane
 
Rebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core FoundationRebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core Foundation
Cadabra Studio
 
Leveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer IntentsLeveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer Intents
Keheliya Gallaba
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
BradBedford3
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-OffMicro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Tier1 app
 
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
WSO2
 
Best Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small BusinessesBest Inbound Call Tracking Software for Small Businesses
Best Inbound Call Tracking Software for Small Businesses
TheTelephony
 
Agile Software Engineering Methodologies
Agile Software Engineering MethodologiesAgile Software Engineering Methodologies
Agile Software Engineering Methodologies
Gaurav Sharma
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Why Indonesia’s $12.63B Alt-Lending Boom Needs Loan Servicing Automation & Re...
Prachi Desai
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
Ad

AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION

  • 1. AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION Computer Science Department Universidad Autónoma de Madrid (Spain) http://miso.es Juan de Lara joint work with Pablo Gómez-Abajo, Pablo C. Cañizares, Esther Guerra and Alberto Núñez
  • 2. WHERE I COME FROM Universidad Autónoma de Madrid • Established in 1968 • North part of Madrid (campus Cantoblanco) • One of the top universities in Spain • >30000 students • “Excellence” campus with CSIC (Spanish Research Council) Computer Science and Telecom. Engineering • Created in 1992 • 96 full time professors • Joint diploma Comp.Sci.-Maths 2
  • 3. MISO GROUP Modelling and software engineering research group 5 professors, 4 research associates, 3 PhD students Automate software construction • Model-driven engineering • Domain-specific languages • AI – LLMs • Testing 3 http://miso.es
  • 4. TODAY’S TALK How to test software that is hard-to-test because: • Unclear oracles (exact expected output for a given input) • Complex input data (e.g., model-like) Examples • Machine learning systems • Autonomous driving systems • Compilers • Image/signal processing software • Scientific computing applications • Chess engines • … Often, simulators fall into this category • Climate, physics engines, economic models, cloud simulators 4
  • 5. METAMORPHIC TESTING (MT) 5 Imagine we need to test the software for a calculator Let’s start by the sin function in=31.5, expected=0.5224 … in=-1.2, expected=−0.0209 System under test (SUT) Test suite Test suite = collection of test cases Test case = input data + oracle (expected outputs)
  • 6. METAMORPHIC TESTING (MT) 6 in=31.5 SUT Input data Test execution out=0.5224 Oracle Output 0.5224 in=31.5, expected=0.5224 … in=-1.2, expected=−0.0209 Test suite == Test passes
  • 7. METAMORPHIC TESTING (MT) How to build a test suite? 7 Test case input Test case oracle SUT 31.5 function sine -1.2 sin(31.5)==0.522498565 ? sin(-1.2)==-0.0209... ? How to include an input value that is not in the test suite? If the mechanism that calculates this values (oracle) is not present, then we are incurring in the oracle problem Test case result Oracle (exact expected output) vs partial oracle
  • 8. METAMORPHIC TESTING (MT) 8 How does MT alleviate this challenge? • Rather than using specific values, MT is based on underlying knowledge of the SUT: Use trigonometric properties of the sin function, like: −sin(x) = sin(-x) Test case input Test case oracle SUT 77 function sin -sin (77) == sin (-77) Test case result
  • 9. METAMORPHIC TESTING (MT) 9 This allows to easily generate test cases with oracles by: • Taking any input x [source test case] • Generate another input -x [follow up test case] • Execute sine on x • Execute sine on -x • Compare results The sine trigonometric property is a Metamorphic Relation (MR) • Involves checking properties of two or more system inputs [x2 = -x1] • And testing if the outputs are related as expected [-sin(x1) == sin(x2)] • MR  [x2 == -x1] implies [-sin(x1) == sin(-x2)]
  • 10. METAMORPHIC TESTING’S ORACLE 10 The oracle uses multiple inputs in1=77 SUT Input data out1=0.9744 Outputs in2=-77 out2=-0.9744 Oracle -out1 == out2?
  • 11. HOW IS THIS RELATED TO SIMULATION? Image we’d like to test a cloud simulator 11 Cloud model + workload input Report (exec. time) output What’s the oracle here? Cloud Simulator
  • 12. HOW IS THIS RELATED TO SIMULATION? Metamorphic testing to the rescue! 12 Cloud Model + workload input1 Report (exec. time) output1 Cloud model (less computing nodes) + same workload input2 Report (exec. time) output2 This time should be lower or equal Cloud Simulator
  • 13. HOW IS THIS RELATED TO SIMULATION? Create a set of MRs capturing expert knowledge in the simulators’ domain Create a set of seed test cases • Prototypical cloud models Use the MRs to create follow-up test cases • Use the MRs to derive cloud models that satisfy the MR’s pre-condition Use the MRs as oracles between seeds and follow-ups We can use metamorphic testing to benchmark simulators in a given field • Do they pass all expected MRs? 13
  • 14. HOWEVER… Building an MT environment for a domain is hard • Complex inputs: Parsing to describe metamorphic relations • Outputs: Parsing to check metamorphic relations • Metamorphic relations: • Express them in an understandable way • Flexibility to change and create new ones while testing • Automated generation of follow-ups • Statistics of the testing process • Failure rate of every metamorphic relation • Analysis of inputs making a metamorphic relation fail • Capabilities to compare several SUTs (e.g., benchmark simulators) 14
  • 15. OUR GOAL Facilitate the construction of MT environments for various domains • Model-driven engineering for this purpose • Use meta-modelling and transformations Practical framework • Gotten: an Eclipse plugin to build MT environments • Considers the comparison of alternative SUTs • Domain-independent: applicable to systems in different domains • Follow-up generation • Domain Specific Language for MRs 15 Facilitate the MT process by using AI • Conversational assistants based on LLMs • Help in deriving MRs, explaining MRs, generating follow-ups
  • 16. AGENDA Model-driven engineering • Cloud simulation as running example The Gotten framework • Benchmarking cloud simulators using MT AI-based MT • Assess its potential to help in the MT process 16
  • 17. MODEL-DRIVEN ENGINEERING Use models to automate the different phases of software development • Specification • Testing • Verification • Simulation • Code generation • … Models described using: • General purpose modelling languages (e.g., UML) • Domain-specific languages 17
  • 18. DOMAIN-SPECIFIC LANGUAGES (DSLs) Develop software with higher levels of quality and productivity Increasing the level of abstraction: models Fewer “accidental” details, notations, and language closer to the problem Models are not just documentation • They are used to generate code, they can be executed • Specific and well-understood domains 18
  • 19. DOMAIN-SPECIFIC LANGUAGES: PARTS Abstract syntax • Concepts and primitives of the domain • Described via a meta-model Concrete syntax • How the models are represented and interacted with • Textual, graphical, tabular, etc. Semantics • What models mean • Via transformations (to other models, to code, rewriting/simulation) 19 val sim = new Simulation // Power model for a host val simplePowerModel = new PowerModel { override def getPower(cpuUtil: Double): Double = { 100 + (cpuUtilization * 200) } }…
  • 20. META-MODELS AND MODELS 20 ComputeNode name: String cpuCores: int ramGb: int n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 meta-model model csc1: Switch name=“sw1” ports=48 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes «conforms to» Meta-model • Primitives of the language, properties, relations • Conceptual model of the domain •  class diagram (classes, attributes, associations) Model • Instance of the meta-model •  object diagram (objects, slots, links) 0..1 switch * nodes :switch Switch name: String ports: int speed: double
  • 21. FULLY PRECISE AT ALL TIMES: OCL INTEGRITY CONSTRAINTS 21 Meta-model • Integrity constraints, invariants • Object Constraint Language (OCL) «conforms to» positivePorts inv: self.ports > 0 enoughPorts inv: self.ports>= self.nodes->size() ComputeNode name: String cpuCores: int ramGb: int meta-model 0..1 switch * nodes n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 model csc1: Switch name=“sw1” ports=1 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes :switch Model • Should satisfy each invariant Switch name: String ports: int speed: double
  • 22. FULLY PRECISE AT ALL TIMES: OCL INTEGRITY CONSTRAINTS 22 Meta-model • Integrity constraints, invariants • Object Constraint Language (OCL) «conforms to» ComputeNode name: String cpuCores: int ramGb: int meta-model Switch name: String ports: int speed: double 0..1 switch * nodes n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 model csc1: Switch name=“sw1” ports=48 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes :switch Model • Should satisfy each invariant positivePorts inv: self.ports > 0 enoughPorts inv: self.ports>= self.nodes->size()
  • 23. DSLs : CONCRETE SYNTAX 23 (conditional style) Graphical concrete syntax specification Abstract syntax Concrete syntax ComputeNode name: String cpuCores: int ramGb: int n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 meta-model model csc1: Switch name=“sw1” ports=48 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes «conforms to» Switch ports: int speed: double 0..1 switch * nodes :switch «name» «cpuCores» cores «ramGb» Gb RAM «ports» ports «speed» G ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 48 ports 100 G ports==nodes.size() «name» sw1
  • 24. DSLs : CONCRETE SYNTAX 24 (conditional style) Graphical concrete syntax specification Abstract syntax Concrete syntax ComputeNode name: String cpuCores: int ramGb: int n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 meta-model model csc1: Switch name=“sw1” ports=2 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes «conforms to» Switch ports: int speed: double 0..1 switch * nodes :switch «name» «cpuCores» cores «ramGb» Gb RAM «ports» ports «speed» G ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G ports==nodes.size() «name» sw1
  • 25. DSLs : CONCRETE SYNTAX 25 Textual syntax specification Concrete syntax Node “ML infer” { cpuCores: 24 ram: 256 switch: sw1 } Node “ML train” { cpuCores: 32 ramGb: 512 } ComputeNode returns ComputeNode: ‘Node’ name=String ‘{‘ ‘cpuCores:’ cpuCores=Int ‘ram:’ ramGb=Int (‘switch:’ switch=[Switch|String])? ‘}’ … Abstract syntax ComputeNode name: String cpuCores: int ramGb: int n1: ComputeNode name=“ML infer” cpuCores= 24 ramGb=256 meta-model model csc1: Switch name=“sw1” ports=48 speed= 100 n2: ComputeNode name=“ML train” cpuCores= 32 ramGb=512 :switch :nodes :nodes «conforms to» Switch ports: int speed: double 0..1 switch * nodes :switch Switch sw1 { ports: 48 speed: 100 nodes: “ML infer”, “ML train” }
  • 26. DSL-BASED AUTOMATION SOLUTIONS 26 Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 What can we do with such a model?
  • 27. MODEL-TO-MODEL TRANSFORMATION 27 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Create another model, possibly conforming to a different meta-model • Use that other model for analysis or simulation • Results can be transferred back to the original model (back-annotation)
  • 28. IN-PLACE MODEL TRANSFORMATION 28 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator task time=0 Rewrite the model • Simulation or animation • Optimisation or refactoring
  • 29. 29 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator task time=0.1 IN-PLACE MODEL TRANSFORMATION Rewrite the model • Simulation or animation • Optimisation or refactoring
  • 30. 30 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator time=0.4 IN-PLACE MODEL TRANSFORMATION Rewrite the model • Simulation or animation • Optimisation or refactoring
  • 31. MODEL-TO-TEXT TRANSFORMATION 31 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator time=0.4 M2text transformation Text/Code The following is the data-center physical model: Node “ML infer” has 24 cores and 256 GB of RAM Node “ML train” has 32 cores and 512 GB of RAM All nodes are connected to switch sw1, with 2 ports And speed of 100G Generate text from the model • Code generation • Documentation, HTML pages • Prompts for an LLM • …
  • 32. DSL-BASED AUTOMATION SOLUTIONS 32 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator time=0.4 M2text transformation Text/Code The following is the data-center physical model: Node “ML infer” has 24 cores and 256 GB of RAM Node “ML train” has 32 cores and 512 GB of RAM All nodes are connected to switch sw1, with 2 ports And speed of 100G Conversational assistant Is there direct communication between nodes ML infer and ML train? Generate text from the model • Code generation • Documentation, HTML pages • Prompts for an LLM • …
  • 33. DSL-BASED AUTOMATION SOLUTIONS 33 Model (Time Petri Nets) sw1 ML infer [0, 0.1] M2M transformation Petri nets tool Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 [1, 10] ML train [0, 0.1] [1, 10] Fwd train send infer send train Fwd infer next train next infer infer-in train-in Model ML infer 24 cores 256 Gb RAM ML train 32 cores 512 Gb RAM 2 ports 100 G sw1 M2M transformation (in-place) Simulator time=0.4 M2text transformation Text/Code The following is the data-center physical model: Node “ML infer” has 24 cores and 256 GB of RAM Node “ML train” has 32 cores and 512 GB of RAM All nodes are connected to switch sw1, with 2 ports And speed of 100G Conversational assistant Yes, they are both connected to switch sw1, which implies the latency would be minimal Generate text from the model • Code generation • Documentation, HTML pages • Prompts for an LLM • …
  • 34. MDE FOR METAMORPHIC TESTING We can use MDE to facilitate the construction of MT environments DSLs to describe the inputs • Cloud models Model-to-text transformations • Generate input textual artefacts for the cloud simulator(s) DSL for expressing the metamorphic relations • Use OCL for expressing features of the input model • Parse the SuT outputs to obtain output features Use search-based model transformation for follow-up generation 34
  • 35. MDE FOR METAMORPHIC TESTING 35 Application expert Create domain MMs Define SuT(s) execution start EMF ext. point Domain expert Define MRs Fine-tune follow-up generation mrDSL fowDSL Create input test cases Generate follow-ups Metamorphic testing Define MRs satisfactory results? end Tester [yes] [no] GOTTEN GOTTEN
  • 36. A (SIMPLE) META-MODEL FOR CLOUD SYSTEMS 36
  • 37. A (SIMPLE) META-MODEL FOR CLOUD SYSTEMS 37 «conforms to»
  • 38. DSL FOR METAMORPHIC RELATIONS 38 metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] }
  • 39. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 39 • Meta-models of the SuT inputs • Model variables
  • 40. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 40 • Domain features of input models • Expressed in OCL
  • 41. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 41 • Features of execution outputs • Extracted from the executions
  • 42. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 42 Characterisation of alternative SUTs (like different simulators)
  • 43. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 43 Metamorphic relations: • Can use input and output features • Can use the model variables (m1, m2, w1, w2) defined above
  • 44. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 44 MR1: if the cloud m1 has more compute nodes than m2 and workloads are equal then the cloud m1 takes less time (or equal) to process the workload than m2
  • 45. metamodel datacentre "/sample.gotten/model/datac.ecore" with m1, m2 models "/sample.gotten/model/dcmodels" metamodel workload "/sample.gotten/model/workload.ecore" with w1, w2 models "/sample.gotten/model/workloads" datacentre input Features { context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() context DataCentre def: CPU: Int = racks->collect( numBoards*board.nodesPerBoard*board.nodeType.CPUCores*board.nodeType.CPUSpeed) ->sum() } output Features { Time: Long Energy: Long } Processor { Name: String Version: String } MetamorphicRelations { MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] MR2 = [ (CPU(m1) > CPU(m2) and w1 == w2) implies (Energy(m1) <= Energy(m2)) ] } DSL FOR METAMORPHIC RELATIONS 45 MR2: if the cloud m1 has better CPUs than m2 and workloads are equal then the cloud m1 consumes less energy (or equal) to process the workload than m2
  • 46. System under test 1 System under test 1 PROCESSORS Specify • How to transform the input models to the SuT input • How to run the SuT on specific inputs • How to parse the output to obtain the output features (e.g., Energy, Time) Done via extension points defined by Gotten • Eclipse mechanism for extensibility 46 generate(...) execute(...) getFeatures(…) Processor «interface» «implements» System under test 1
  • 47. FOLLOW UPS 47 MR Follow-up Generation DSL Transf. rules Search config MoMOT Seed model Seed model Seed model Seed model Seed model Follow ups Martin Fleck, Javier Troya, Manuel Wimmer: Search-Based Model Transformations with MOMoT. ICMT 2016: 79-87
  • 48. FOLLOW-UPS: FOWDSL 48 // MR1 = [ (NNodes(m1) > NNodes(m2) and w1 == w2) implies (Time(m1) <= Time(m2)) ] // context DataCentre def: NNodes: Int = racks->collect(numBoards*board.nodesPerBoard)->sum() followups for datacentre using MR1 with source folder = "/dcmodels" and output folder = "/dcmodels" NNodes -> delete [1..10] Rack; delete [1..10] Board; decrease [1..10] Rack.numBoards; decrease [1..10] Board.nodesPerBoard maxSolutions 10 iterations 10 algorithms [Random, NSGAII, NSGAIII, eMOEA]
  • 49. FOLLOW-UPS: FOWDSL 49 followups for datacentre using MR1 with source folder = "/dcmodels" and output folder = "/dcmodels" NNodes -> decrease [1..4] Rack.numBoards keeping {Rack.numBoards > 0}; decrease [1..4] Board.nodesPerBoard keeping {Board.nodesPerBoard > 0} maximize ( NNodes(m2) - NNodes(m1) ) maxSolutions 10 iterations 8 algorithms [Random, NSGAII, NSGAIII, eMOEA]
  • 51. GOTTEN 51 Open source, freely available at https://g0tten.github.io/home.html
  • 52. ASSESSING THE APPROACH 52 RQ1: How effective is Gotten to specify MT environments? • Is it better than specifying the environment by hand? RQ2: How suitable are the environments built with Gotten? • Can we perform MT effectively? RQ3: Can Gotten be used to create MT environments for different domains? • Videostreaming • Automata
  • 53. (RQ1) EFFECTIVENESS IN SPECIFYING MT ENVIRONMENTS 53 Cloud simulation domain • Cost of integrating 2 cloud simulators (Dissect, CloudSimStorage) within Gotten • Baseline: ad-hoc MT testing tool called FwCloudMeT FwCloudMeT • Fixed set of MRs • Native follow-up generation Gotten • Extensible set of MRs (DSL) • Follow-ups via MoMOT One order of magnitude less code in Gotten
  • 54. (RQ2) SUITABILITY OF GOTTEN FOR MT 54 Case study: cloud simulation Definition of MRs • 6 MRs Generating follow-ups • 200 follow ups for the 6 MRs • ~11 secs per follow-up MT process • 4 passing MRs for both simulators • Both passing and failing test cases for 1 MR • 1 failing MR for both simulators
  • 55. MRs FOR CLOUD SIMULATION 55 better CPU  less energy machine increase  increase ratio bigger than energy ratio better storage  less time
  • 56. 56 MRs FOR CLOUD SIMULATION better network  less time better memory  less time shorter workload  less time
  • 57. MR5 (“better memory implies less or equal processing time”) • None of the two simulators satisfy this relation • Limitation in the handling of the memory system MR2 (“The proportional increase in machines should be greater than or equal to the proportional increase in energy usage”) • Some scenarios do not pass this expectation • Some idle machines, inefficient scheduling 57 TESTING PROCESS
  • 58. (RQ3) MORE CASES: TESTING VIDEO STREAMING APIS 58 REST API Video Streaming API Music Streaming API … Youtube Vimeo … Spotify Apple Music … is a is a implements implements implements implements domains processors Organise APIs into domains Design metamorphic relations • Applicable across platforms Design test cases • Reusable across platforms Test and compare different platforms
  • 59. MORE CASES: TESTING VIDEO STREAMING APIS 59 "Perform a search, and repeat the search with different search order. Then, the number of videos of each search must be the same" Search: 'world cup' Order: most rated 1- Argentina v France | 19-12-2022 2- Shakira: Waka-Waka | 05-06-2010 3- Spain, road to world cup | 17-09-2020 4- FIFA world cup 2018 | 01-09-2018 5-Brazil vs. France 2006 | 08-08-2006 Search: 'world cup' Order: oldest 1- Brazil vs. France 2006 | 08-08-2006 2- Shakira: Waka-Waka | 05-06-2010 3- Spain, road to world cup | 17-09-2010 4- FIFA world cup 2018 | 01-09-2018 5-Argentina v France | 19-12-2022 Property: Search-1: Search-2: NumVideos(Search-1) == NumVideos(Search-2) ?
  • 60. MORE CASES: YOUTUBE AND VIMEO 60 videostream input Features { context VideoAPITest def: IsSearch: Boolean = request.oclIsTypeOf(SearchVideo) context VideoAPITest def: IsUpdate: Boolean = request.oclIsTypeOf(UpdateVideo) context SearchVideo def: MaxResults: Int = maxResults context SearchVideo def: SearchOrder: Int = orderType } output Features { NVideos : Long OutputVideoId: Long OutputVideoTitle: String } //... MetamorphicRelations { MR1 = [ (IsSearch(m1) and MaxResults(m1) >= MaxResults(m2)) implies (NVideos(m1) >= NVideos(m2))] MR2 = [ (IsSearch(m1) and SearchOrder(m1) <> SearchOrder(m2)) implies (NVideos(m1) == NVideos(m2))] MR3 = [ IsUpdate(m1) and m1 == m2 implies (OutputVideoId(m1) <> OutputVideoId(m2)) and (OutputVideoTitle(m1) == OutputVideoTitle(m2)) ] }
  • 61. TESTING PROCESS Designed 30 test cases • Automatically generated 120 follow-ups All test-cases could be reused for Youtube and Vimeo Results: • All tests for MR1 and MR2 passed • ~7% of tests for MR2 failed in each platform: we obtained different number of videos for search queries with different ordering criteria 61
  • 62. TESTING THE SIMULATOR OR THE MODEL BEING SIMULATED? We can build MRs to test either the simulator or the model Lets consider MT for Determinitic Finite Automata (DFA) • Consider the MR: (w’ == w.1) and Accept(dfa, w’) implies not Accept(dfa, w) • We are testing the DFA model behaves according to our expectations Instead, consider the MR: • (FinalStates(dfa2) == States(dfa1)-FinalStates(dfa1)) implies Accept(dfa1, w1) != Accept(dfa2, w1) (swapping final states yields complement language) • Here we are testing a general property of DFAs that every simulator should fulfil 62
  • 63. TESTING THE SIMULATOR OR THE MODEL BEING SIMULATED? 63 Model Input MRs involving modifications of the model, usually test the simulator Simulator
  • 64. TESTING THE SIMULATOR OR THE MODEL BEING SIMULATED? 64 MRs involving modifications of the model’s input, usually test the model Model Input Simulator
  • 65. AI FOR METAMORPHIC TESTING Generative AI Large Language Models (LLMs) • Produce suitable answers out of natural language text Extensively used today in many domains API-based integration in applications • Agent-based programming • LangGraph, AutoGen, CrewAI, … 65 Text Text LLM LLM task1 task2 … shared state (prompt) prompt Structured output
  • 66. AI FOR METAMORPHIC TESTING Use large language models to help the tester Assistive tasks • Create an MR out of natural language text • Explain an MR in natural language • Simplify an existing MR • Combine existing MRs • Derive new MRs • Derive N follow-up test cases of a model for a MR • Derive initial test cases • … 66
  • 67. AI FOR METAMORPHIC TESTING: STRATEGY Prompt engineering • Explanation of Gotten • Grammar of Gotten (Xtext) • Meta-model of the domain • Current model • Current selection in tool • Specific prompt depending on task Agent workflow • Some tasks solved with the LLM • Safeguards: error checking and repair 67
  • 68. AGENT WORKFLOW TO CREATE MR 68 Task Classification Create MR out of NL Syntax checker MR fixer LLM … openAI … (all other tasks) Gotten IDE prompt Gotten code Errors Gotten code compilation
  • 69. AGENT WORKFLOW TO CREATE FOLLOW UPS 69 Task Classification Create N followups EMF checker MR checker LLM … openAI … (all other tasks) Gotten IDE prompt EMF model Errors syntactic repair errors? errors? semantic repair Error y n y more? n Generated models y n
  • 70. PRELIMINARY RESULTS FOR FOLLOW UP GENERATION 70 MR CPU(m1) > CPU(m2) and w1 == w2 implies … NMachines(m1) > NMachines(m2) and w1 == w2 implies … Storage(m1) > Storage(m2) and w1 == w2 implies… Network(m1) > Network(m2) and w1 == w2 implies… Memory(m1) > Memory(m2) and w1 == w2 implies… N Correct (syn) Correct (sem) Diff Median (ms) 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 3296 3351 3606 3448 3088 Task: Generate N follow-up models from seed model and MR Model: gpt-4.1-mini, temperature=0.7 Seed model size: 6 objects [1 rack]
  • 71. PRELIMINARY RESULTS FOR FOLLOW UP GENERATION 71 MR CPU(m1) > CPU(m2) and w1 == w2 implies … NMachines(m1) > NMachines(m2) and w1 == w2 implies … Storage(m1) > Storage(m2) and w1 == w2 implies… Network(m1) > Network(m2) and w1 == w2 implies… Memory(m1) > Memory(m2) and w1 == w2 implies… N Correct (sem) Median (ms) 10 10 10 10 10 8 6 8 8 9 6246 7982 7788 6488 6611 Task: Generate N follow-up models from seed model and MR Model: gpt-4.1-mini, temperature=0.7 Seed model size: 18 objects [4 racks] First loop Correct (sem) Median (ms) 2 4 2 2 1 7889 5514 5569 6615 8946 Repair
  • 72. COMMENTS… THE POSITIVE Promising results • All models syntactically correct at first generation • All models semantically correct (are follow-ups) at first generation Faster than using MoMOT • Search-based transformation is very heavyweight All follow-ups are different • This was inserted in the prompt • But all of them structurally equal (different attributes) 72
  • 73. COMMENTS… THE CAVEATS Caveats: • Relatively simple MRs • Simple seed models Discussion • Is this the right approach to follow-up generation? • SAT solving/Search-based optimisation vs. Prompt + repair cycles • Solid engineering vs. More fragile systems • Measure structural diversity of solutions • Handling constraints that could not be handled before (e.g., on Strings)? 73
  • 74. CONCLUSIONS Metamorphic testing helps assessing systems that are hard to test • Simulators often fall in this category Metamorphic relations • Involve several inputs and their expected results • Both as oracles and to generate follow-up test cases Gotten framework helps creating metamorphic testing environments • Based on MDE principles • Examples for cloud simulation AI for metamorphic testing • Agentic workflows based on LLMs to help in several tasks 74
  • 75. OUTLOOK Metamorphic testing in other domains (involving simulation) • If you have a case study, let’s talk! Systematic assessment of AI assistance quality • Evaluation of each task Metamorphic testing to assess conversational agents • User simulation + metamorphic rules 75
  • 76. MORE ABOUT GOTTEN 1. GOTTEN: A model-driven solution to engineer domain-specific metamorphic testing environments. Pablo Gómez-Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de Lara. 2023. In ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MoDELS 2023), Västerås. 2. Automated engineering of domain-specific metamorphic testing environments. Pablo Gómez- Abajo, Pablo C. Cañizares, Alberto Núñez, Esther Guerra, Juan de Lara. 2023. In Information and Software Technology (Elsevier). 3. New ideas: Automated engineering of metamorphic testing environments for domain-specific languages. Pablo C. Cañizares, Pablo Gómez-Abajo, Alberto Núñez, Esther Guerra, Juan de Lara. 2021. In ACM SIGPLAN International Conference on Software Language Engineering (SLE 2021), Chicago. Best new ideas/vision paper award at SLE’21 76 https://g0tten.github.io/home.html