-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Improve assertion message for assert_frame_equal
#39967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why not showing only the different values, instead of the whole columns? |
Agreed - this would be even more convenient. |
PR would be welcome. |
take |
Trying to use your helper function but one of the variable names is incorrect (I think). You call
but we don't have a |
this improvement would be great. any chance the PR is somewhere as a draft? I could help completing it. I don't know which was the initial intention for showing in index/column, but one way to address @benhammondmusic would be to use the left dataframe as the reference for the index/columns.
|
Wouldn't pd.DataFrame.compare be a good fit for the assertion message? Also, it can be very helpful to know which tolerance was violated (at least for the whole dataframe, not per difference). |
Problem description
For testing data pipelines using pandas I usually use
assert_frame_equal
to compare expected and resulting dataframes. However, in some circumstances (e.g. test dataframes with more than 20 rows/columns and timestamps) the resulting assertion message may not provide enough information to easily identify the difference between expected and resulting dataframe.Consider the following example with timestamp columns:
The resulting assertion message is the follwing:
It is already hard to spot the actual difference even though we have only 2 columns with 10 rows. I don't know the column name and the affected indices.
Proposal
The assertion message could include the name of the column and differing indices, like:
I can find the differences by using some additional boilerplate code however it would be more convenient to have such assertion information right away.
API breaking implications
There should be no breaking API changes. However,
assert_extension_array_equal
andassert_numpy_array_equal
may need an additional keyword argument in order to pass the column name.Describe alternatives you've considered
Currently, I use the following helper function to see the actual differences:
This works nicely but I'd rather have the information in the assertion message already.
If agreed, I could provide a PR myself. Thanks for looking at it anyway.
The text was updated successfully, but these errors were encountered: