Paths

Table of Contentst

Diffusion 0 A.D. Public Repository rP23773

Don't overwrite JS error reporting by calling JS_ReportError.
Needs VerificationrP23773
Actions

Description

Don't overwrite JS error reporting by calling JS_ReportError.

Upstream spidermonkey supports JSNative error reporting by returning an explicit failure code. This provides a full stacktrace. Calling JS_ReportError, removed upstream, removes that stacktrace.

Instead, we should simply LOGERROR.

Based on a patch by: elexis

Reviewed By: wraitii

Differential Revision: https://code.wildfiregames.com/D2627

Details

Auditors

elexis

Committed

wraitii

Jun 14 2020, 12:34 PM

Reviewer

wraitii

Differential Revision

D2627: Report GUI object script handler error stack

Parents

rP23772: build-osx-libs: Update GnuTLS to 3.6.13 (unbreak macOS 10.15)

Branches

Unknown

Tags

Unknown

Build Status

Buildable 12097
Build 22793: Trigger Windows Autobuild
Build 22792: Post-Commit Build	Jenkins

Event Timeline

wraitii committed rP23773: Don't overwrite JS error reporting by calling JS_ReportError..Jun 14 2020, 12:34 PM

wraitii added an edge: D2627: Report GUI object script handler error stack.

Harbormaster completed building B12097: rP23773: Don't overwrite JS error reporting by calling JS_ReportError..Jun 14 2020, 12:46 PM

Instead, we should simply LOGERROR.

I'm not sure if I would agree with that part. This is not an "instead" but an "additionally" (JS exceptions should always be reported when they occur as such, so they can be caught with try/catch or otherwise abandon the JS execution and report the full stack etc).

When applying the testplan I posted to the committed patch:

ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: Errors executing script event "SelectionChange"
ERROR: JavaScript error: gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js line 51
Error: 
ERROR: GUI page 'page_gamesetup.xml': Failed to call init() function

With my version of the patch:

ERROR: JavaScript error: gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js line 51
Error: 
ERROR: GUI page 'page_gamesetup.xml': Failed to call init() function
ERROR: GL error GL_INVALID_VALUE (0x0501) occurred

Is this intended by the author of the committed patch that LOGERROR is printed 7 times but the stack only once?

Aside from that:

../../../source/gui/Scripting/JSInterface_IGUIObject.cpp: In function ‘bool JSI_IGUIObject::getProperty(JSContext*, JS::HandleObject, JS::HandleId, JS::MutableHandleValue)’:
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp:125:2: error: ‘LOGERROR’ was not declared in this scope
  125 |  LOGERROR("Property '%s' does not exist!", propName.c_str());
      |  ^~~~~~~~
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp: In function ‘bool JSI_IGUIObject::setProperty(JSContext*, JS::HandleObject, JS::HandleId, JS::MutableHandleValue, JS::ObjectOpResult&)’:
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp:162:4: error: ‘LOGERROR’ was not declared in this scope
  162 |    LOGERROR("on- event-handlers must be functions");
      |    ^~~~~~~~
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp:175:2: error: ‘LOGERROR’ was not declared in this scope
  175 |  LOGERROR("Property '%s' does not exist!", propName.c_str());
      |  ^~~~~~~~
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp: In function ‘bool JSI_IGUIObject::deleteProperty(JSContext*, JS::HandleObject, JS::HandleId, JS::ObjectOpResult&)’:
../../../source/gui/Scripting/JSInterface_IGUIObject.cpp:202:2: error: ‘LOGERROR’ was not declared in this scope
  202 |  LOGERROR("Only event handlers can be deleted from GUI objects!");
      |  ^~~~~~~~
make[1]: *** [gui.make:262: obj/gui_Release/JSInterface_IGUIObject.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:127: gui] Error 2

This commit now has outstanding concerns.Jun 14 2020, 2:01 PM

In rP23773#42862, @elexis wrote:

(JS exceptions should always be reported when they occur as such, so they can be caught with try/catch or otherwise abandon the JS execution and report the full stack etc).

This is already happening.

Is this intended by the author of the committed patch that LOGERROR is printed 7 times but the stack only once?

I'll say yes.

I'm assuming your error comes when compiling without precompiled headers?

Index: source/gui/Scripting/JSInterface_IGUIObject.cpp
===================================================================
--- source/gui/Scripting/JSInterface_IGUIObject.cpp     (revision 23773)
+++ source/gui/Scripting/JSInterface_IGUIObject.cpp     (working copy)
@@ -23,4 +23,5 @@
 #include "gui/CGUISetting.h"
 #include "gui/ObjectBases/IGUIObject.h"
+#include "ps/CLogger.h"
 #include "scriptinterface/ScriptExtraHeaders.h"
 #include "scriptinterface/ScriptInterface.h"

To fix the build without pch.

wraitii mentioned this in D2819: Fix --without-pch build following rP23773.Jun 14 2020, 10:34 PM

wraitii mentioned this in rP23775: Fix --without-pch build following rP23773.Jun 14 2020, 10:51 PM

Observation:

Is this intended by the author of the committed patch that LOGERROR is printed 7 times but the stack only once?

I'll say yes.

For me as the author of the patch based on the commit, it was not.
This behavior had not been described by me nor by you in the revision.
As far as I see, developers who have run into a JS exception should also not expect this (for the reasons described further below).

Surprises:
(S1) To me it's surprising that it's 7 and not 8 LOGERROR and
(S2) It's surprising that it's 7 LOGERROR + 1 Exception, because the expectation is that every JS exception is reported with stack and not selectively 1 in 7 while each of them is LOGERRROR reported simultaneously.

Questions:
An observation which was revealed by the addition of LOGERROR and performing testing the modified code which was the same as performing the steps of the test plan. Hence
(Q1) Does Reviewed By: wraitii include performing Test Planfield?
(Q2a) If the answer is well-yes-but-actually-no, then could there be potential benefit including efficiency in doing so?
(Q2b) If the answer is actually yes, why was the consequence to ignore it instead of reason and documenting or changing it?
(Q3) Does I'll say yes refer to I don't think we have to have the level of carefulness that @elexis claims we need / I do care about the code quality. Not as much as you, certainly?

Technical analysis:
Turns out for default skirmish map its 7 messages.
When I select the random random map, set to 8 players, save persistmatchsettings, then add the throw and start again, I will get 8 messages.

This leads me to the hypothesis that its the recursive onGameAttributesChange approach rather than the fixed number of max 8 players.
If this is true, then it would mean that IGUIObject::ScriptEventWithReturn is called recursively N times, prints the LOGERROR N times and prints the stack only once.

That would imply that it switches between JS and C++ repeatedly, that there can be N JS exceptions that are thrown but only one of them reported.
If that is true and the objective of the patch includes reporting all JS exceptions, then the patch is wrong.

So I tested again:

Index: binaries/data/mods/public/gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js
===================================================================
--- binaries/data/mods/public/gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js	(revision 23775)
+++ binaries/data/mods/public/gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js	(working copy)
@@ -48,6 +48,8 @@
 
 	onSelectionChangeSuper()
 	{
+		error("print error");
+		throw new Error("throw error");
 		if (!this.isInGuiUpdate)
 			this.onSelectionChange(this.dropdown.selected);
 	}

Then I get

ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: JavaScript error: gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js line 52
Error: throw error

Given that there are N "print error" occurrences, there must have been N JS exceptions, but there is only one (the last one) reported.

After a JS exception, the code path is terminated, so it seems wrong that it still continues here.

The return value of the modified function was introduced in 23652 and used in 23653. I reverted both using svn merge -c -REV . to test if it changed something, but it didn't.
Removing the LOGERROR also doesn't change the fact N exceptions are thrown but only one of them reported (obviously but also experimentally confirmed), so my original version of the patch had the same issue that was revealed by the addition of LOGERROR.

From what I understand, a JS exception thrown should (1) report each exception and (2) stop that code path after the first uncaught exception (runtime error).
(Probably shouldn't have to prove that but I guess the first statement that I dont prove is the one to be declared subjective)
https://tc39.es/ecma262/#sec-error-handling-and-language-extensions

An implementation shall report all errors as specified, except for the following:
All errors that are not early errors are runtime errors.

I couldn't find a definition for runtime error that would show that it must actually stop evaluation of the code - if it's considered controversial I can look further (also source is not ES version 6, but that shouldn't matter either).

So the patch changed which stack was reported but didnt address the other seeming issue that was revealed by the LOGERROR.

The expected output would be either exactly one new Error/LOGERROR()/error() tuple (if one JS message triggers the N JS function calls that have the 1 exception each)

ERROR: print error
ERROR: Errors executing script event "SelectionChange"
ERROR: JavaScript error: gui/gamesetup/Pages/GameSetupPage/GameSettings/GameSettingControlDropdown.js line 52
Error: throw error

or all of that N times (if C++ originates the JS function calls), but not 6 unreported exceptions + 7 LOGERRORS +1 reported exception.

Concerns:
(C1) Technical issue: an issue in error handling was addressed, another issue in error handling that has existed before but was revealed by the patch has surfaced following the LOGERROR
(C2) Systematic issue: it seems the test plan was not performed (because running the test plan would have yielded the observation and because the observation would have implied a changed patch or bugreport).
(C3) Systematic issue: When asking about the unexpected observation, it was cast aside with a I'll say Yes instead of an let me check.

That list excludes the concerns of precompiled headers compilation failure and the rush to commit the patch 80 minutes after it was modified after it stood accepted for 2 weeks.
The technical concern isn't so crucial, because JS exceptions should never happen to begin with and if they happen the game is broken already.
The systematic issues are, because they repeated so often and are not deemed issues but that its an issue to point out issues.

Warrantedness:
I am tolerant as to how many issues are introduced in patches (There is even a saying that in order to make less errors you have to make more errors).
I am less tolerant in how systematic issues are repeatedly talked into irrelevance, doomed to be repeated and actually repeated since at least 5 years.
When I worked here I tried to learn from every mistake and tried to never repeat an error.
But you repeated on multiple occasion throughought the years when not only leper or me asked you about such systematic issues on the development platforms that the technical concerns are not important enough to change the way how we examine and test code, that the correct solution is to "leave me alone, I know what I'm doing" or, "I don't think we have to have the level of carefulness that @elexis claims we need." [26349-troubles/], or "I do care about the code quality. Not as much as you, certainly" [2016-06-25-QuakeNet-#0ad-dev.log].
If the conclusion is not that all modified code paths of patches must be tested before committing and before review request, not that performed reviews must be complete and completely documented, not that discovering and predicting possible issues from static review and not that unexpected code behavior must be investigated, but that the people pointing them out must be banned because they are bothersome with their high standards and tenacity, I can also only reproduce the observation "aren't how things should be done" [2016-06-25-QuakeNet-#0ad-dev.log] and "why am I still here?" [2015-11-05-QuakeNet-#0ad-dev.log], because the alternative to that is that the errorrate per release increases by at least an orders of magnitude, that there are more issues introduced than fixed, that the claimed objectives of the organization such as playable releases or a finished game, or a finished game as good as or better than the vision of the founders cannot be achieved, and that has already been seen and testified by the playerbase [for example in https://www.youtube.com/watch?v=uqHzawvxhO8 "20:20 welcome to alpha 24 where everything is broken", "8:50 there must be some big changes, halfway through, like half implemented something", "9:04 whatever it is, I'm keen to see what it is whenever it is done, because its some freaky stuff right now"].
I am very well aware of how extremely much time it costs to quadruplecheck every line 5 times to make sure there is nothing unexpectedly wrong with an action, but it is necessary if one wants to achieve triple A status and not commit and not release something significantly regressed.
I claim proportionateness because it's not about C1 but C2, C3, the developer bodycount relating to that 1.5, the errorrate consistent, the willingness to change noticeable in availability to write few unit tests and readiness to revert 2 commits, but not in carefulness and thoroughness about the unit tests, code and reviews performed.
And I don't "leave you with two non-absurd choices", "Stop reviewing, stop committing.", "Completely ignore whatever you're saying", because the third option is to continue a ban discussion and the other one is to check code more thoroughly in the interest of your, mine, the userbase and the organization, test all modified code paths, follow up unexpected code behavior, plan things before implementing them, cross reference plans with previous plans, and follow instead of casting aside uncomfortable reports that arent glaring obvious defects.
Perhaps there is a very good reason to LOGERROR 7 times and report only one JS exception. If there is, you have not posted it in the review and not not answered it either when I asked you about it.
It's now 4am and I spent 5 hours reviewing and clarifying my post, nothing will change except for the ban highscore, regardless, thanks for the well intended commit, have a nice day and enjoy precious life.

Thanks for investigating this.
My understanding is that C++ code is not stopping after the first JS error, thus calling subsequent handlers (and JS complies as expected). What exactly is causing this seems unimportant wrt to the issue.
To fix this, I only see one pattern: somehow raise the information from ScriptEvent to the calling C++ code that it needs to stop and let the JS exception be reported.

One option is throwing a C++ exception, since the boolean return value is used to indicate something different (see D2727). It's a typical 'trilean' -> true, false, ERROR. No other GUI code that I know of throws an exception, so it seems like something that shouldn't be looked at in a vacuum -> for another day.

Another alternative is calling JS_IsExceptionPending after each call to a ScriptEventWithReturn. This seems more productive, but less systematic, and somewhat invasive.
It would probably be a better direction to have LOGERROR (or a variant thereof) also check for pending JS exceptions, make sure JS reports them correctly, then moves on.
That also seems like a good ticket for another day, as the error reporting mechanics are changing in upstream SpiderMonkey.

There is in fact already a ticket for this -> https://trac.wildfiregames.com/ticket/742 .

The concern about PCH build is fixed, the architectural concern with correctly reporting JS errors is still in the air, unrelated to this particular patch, and with the situation being arguably improved (since we now report, in one case, that several errors are happening - it's something), I am requesting verification of this commit.

This commit now requires verification by auditors.Jun 15 2020, 2:06 PM

The trouble is actually subtly worse than the commit or subsequent comments outline.

ObjectOpResult actually overwrites the error when fail() is called, using instead whatever reason is given. This is generally fine, but if JS_ReportError is called before, the stack trace is wiped (dunno why).
One can JS_ReportError, but should then process exceptions before returning fail(), which then reports its own error.

Therefore, IMO, we ought to LOGERROR errors that are one-level deep, and for code that may or may not fail, check for pending exceptions.

wraitii mentioned this in D3093: Improve JS Exception handling..Nov 11 2020, 7:08 PM