Skip to content

Update tests for relaxed

from
This PR updates the tree-construction dat files for the HTML change
which will allow additional tags within 
    
      
    

@flavorjones
Copy link
Contributor

FWIW when CI workflows are enabled, Nokogiri (downstream) tests will fail. I've started working on a branch with the proposed changes from whatwg/html#10557

@josepharhar
Copy link
Author

@flavorjones
Copy link
Contributor

Nokogiri work-in-progress at sparklemotion/nokogiri#3317

flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Oct 16, 2024
flavorjones added a commit to flavorjones/html5lib-tests that referenced this pull request Oct 16, 2024
@flavorjones
Copy link
Contributor

@josepharhar I've got a question about two tests that are very similar. Zooming in on this one from tree-construction/tests1.dat:

#data

Nokogiri is constructing a different tree:


  
      

and I wanted to ask for a double-check that the test's assertion is correct, before I dive into Nokogiri's parser. Thank you!

@josepharhar
Copy link
Author

Thanks for asking!

It looks like Nokogiri is nesting one and the , I think there is a mismatch between my spec PR and the chromium implementation - specificially the chromium implementations stops in this case before reconstructing formatting contexts, which isn't in the spec: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/parser/html_tree_builder.cc;l=951;drc=9776ce6636ab1d68d2b1c4f1f719eb71becf39a3

I'll take a closer look at which we should do and get back to you on that. Thanks!

@flavorjones
Copy link
Contributor

flavorjones commented Oct 17, 2024

Thanks for replying so quickly.

It looks like Nokogiri is nesting one select inside the other

No, sorry, unless I'm misunderstanding your comment, this is not a correct description of what Nokogiri's parser is doing. Here's a more graphical representation of the tree from my previous comment:

body
├── select
│   └── b
│       └── option
└── b
    └── select
        └── option

i.e. .

The select tags are not nested. Just wanted to clarify. (And if it's helpful context, Nokogiri is maintaining a fork of libgumbo.)

@josepharhar
Copy link
Author

Ah whoops, I failed to read the tree properly 😅

I looked into why chromium is putting the b inside the select instead of the other way around, and i found that the tag in the test is what's making the difference - without it I am getting the same output as Nokogiri.

This is happening due to this call to the adoption agency algorithm when is parsed: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody:adoption-agency-algorithm-4

I verified that chromium does the same thing as Nokogiri when I comment out that call to the adoption agency algorithm.

There's a lot of steps in that algorithm and I'm having a hard time wrapping my head around it, but does Nokogiri have that call to the adoption agency algorithm too? Or perhaps the implementations of the algorithm are different?

@flavorjones
Copy link
Contributor

@josepharhar Thanks again for your kind reply!

Yes, Nokogiri's libgumbo has implemented the adoption agency algorithm, and I have confirmed that in these tests we are invoking it. It's helpful to know this is likely the source of the behavior difference, so I'll focus my efforts on making sure it matches the current spec (though I'll note it passes every other test in this suite ... 🤷).

@flavorjones
Copy link
Contributor

flavorjones commented Oct 18, 2024

OK, I think I know what's going on here. I think Chromium has missed this change:

image

https://github.com/whatwg/html/pull/10557/files#diff-41cf6794ba4200b839c53531555f0f3998df4cbb01a4d5cb0b94e3ca5e23947dR124618

If select is not in the list for default scope tags, then this adoption agency step:

If formattingElement is in the stack of open elements, but the element is not in scope, then this is a parse error; return.

will not trigger a parse-error-and-return, and the algorithm will continue. But select is in the list now, and so the formatting element b is not in scope.

If I remove the select tag from the default scope tags in Nokogiri's libgumbo, then behavior matches the test (and presumably chromium).

Can you check to see if my hunch is right?

@josepharhar
Copy link
Author

Thanks so much for figuring this out! Yeah I totally missed that in the chromium implementation but I just added it and updated the tests here.

@flavorjones
Copy link
Contributor

@josepharhar That's great! Thank you!

I've got a patch to get the error messages to a point where libgumbo is passing, would you mind taking a look and potentially applying to this PR? (Renamed to .txt to get by the github attachment filter.)

select-test-errors.patch.txt

@josepharhar
Copy link
Author

Looks good, thanks! I applied it.

Sometimes I wonder why chromium throws away parse errors.

aarongable pushed a commit to chromium/chromium that referenced this pull request Oct 30, 2024
This change was included in the parser spec changes but I forgot to add
it to the implementation. It was identified here:
html5lib/html5lib-tests#178

Change-Id: I13f7ba11dc2dda814e488829a05fe4ee7c670d52
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5948083
Commit-Queue: Joey Arhar 
Reviewed-by: David Baron 
Cr-Commit-Position: refs/heads/main@{#1375607}
untitaker added a commit to untitaker/parse5 that referenced this pull request Nov 1, 2024
html5lib/html5lib-tests#178

The proposal isn't merged yet, and the error codes are off. The forked
html5lib-tests makes it more complicated. I recommend to ditch the fork
of html5lib-tests and give up on standardizing errors.
flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Nov 23, 2024
flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Nov 23, 2024
flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Nov 25, 2024
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Dec 4, 2024
This patch makes the HTML parser convert nested  instead of 
This patch makes the HTML parser convert nested  instead of 
This patch makes the HTML parser convert nested  instead of 
This patch makes the HTML parser convert nested  instead of 
Context here:
whatwg/html#10557 (comment)

Tests will be added here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I62cd28979abff3d5ea30b3502872f273a74bfde4
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6042598
Commit-Queue: Joey Arhar 
Reviewed-by: David Baron 
Cr-Commit-Position: refs/heads/main@{#1391836}
@josepharhar
Copy link
Author

I just updated this PR, which mostly involved reverting stuff. Should all be up to date and in sync with the whatwg/html PR now.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Dec 10, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of 

This patch makes the HTML parser convert nested  instead of 
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of 

This patch makes the HTML parser convert nested  instead of 
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of 

This patch makes the HTML parser convert nested  instead of 
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of 

This patch makes the HTML parser convert nested  instead of 
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of 

This patch makes the HTML parser convert nested  instead of 

flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Dec 19, 2024
@josepharhar
Copy link
Author

Please let me know if I can help format these for you.

That would be great! I could try but I might mess it up 😅

@flavorjones
Copy link
Contributor

flavorjones commented Jan 19, 2025

@josepharhar I've attached a patch file that indicates the number and nature of the error messages generated by libgumbo.

I want to acknowledge @untitaker's comment above that these error messages do not follow the format of other existing errors or the format suggested by whatwg/html#1339, but this is the best I can do in the time I've allotted.

error_messages.patch.txt

@josepharhar
Copy link
Author

Thanks for generating those errors! I applied them in the latest commit

@josepharhar josepharhar mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants