CPU profiling

uckelman · June 15, 2020, 12:07pm

Thus spake Brent Easton:

In 2009(!), we changed Decorator (and BasicPiece) to implement the
PropertyNameSource interface which allows every Decorator to expose the
names of the properties they maintain. I use this in the Beanshell
expression builder to create lists of properties that can be inserted
into the current expression.

I expect it will be FAR more efficient to extract the exposed property
values into a Map and save that as the SNAPSHOT to use in the
ReportState trait, than cloning the piece. You will also need to review
the existing Decorator.getPropertyNames() implementations to make sure
they are up to date.

I think you will get far more improvement for substantially less work.

I did some work on this which never made it to a release. What I had was
a top-level piece wrapper containing a HashMap from property names to
getters and setters for the properties. What this does is change getting
or setting a property from being O(n) to O(1), which should make a
massive difference with pieces that have a long list of traits. (It
might be the single most significant optimization we could do for a
lot of modules.)

I still have it somewhere. I should dig that out.

–
J.

Flint1b · June 15, 2020, 12:39pm

Sounds good! This leads to another question though: At the moment the PieceCloner works on GamePiece-type objects, but GamePiece is not related to PropertyNameSource. All types that implement GamePiece except for the Stack and the Deck also extend or implement PropertyNameSource. For this change the GamePiece would have to extend PropertyNameSource, but how would Stack and Deck implement this interface, whose properties would they return, those of all their contents or just the properties of a single piece inside them or no properties at all?

Another two slightly off-topic questions about these properties, are the property keys always of type String? Are the only possible property keys the ones defined in the Properties interface, and could they be changed into an enum?

Brent_Easton · June 15, 2020, 1:16pm

For the generation of the SNAPSHOTS for the ReportState, the PieceCloner is only ever called with a Decorator, or potentially a BasicPiece (For a piece consisting of a BasicPiece trait only), both of which implement PropertyNameSource.

Even though the Inner piece of a Decorator is defined as a GamePiece, it can never be anything except a Decorator or a GamePiece.

Sadly, a BasicPiece is not a Decorator, which will lead to some instanceof and casting ugliness somewhere.

There is no need to try and support Stacks or Decks.

Yes, the property names are always Strings. But the Properties interface only contains the property names that are commonly used from components outside a piece. There are many more properties used internally by various traits and some of those names are variable between modules (created by the module designer - Case in point, Dynamic Property names). getPropertyNames() in the PropertyNameSource interface returns those names.

Flint1b · July 9, 2020, 9:51pm

I have made a rough proposal now under PR #75 github.com/vassalengine/vassal/pull/75, can someone, maybe Brent, have a look at this and give me some feedback? This downcasting sure looks unsafe but if we are 100% sure that only objects of the proper type ever pass through this code then casting like this is ok.

Also:

You haven’t used reflection for this have you? It’s ok to look at the method names and deduce the property names based on JavaBean convention (getX setX isX etc) but most other kinds of reflection are evil.

And either way, can you dig that out if you find the time?

uckelman · July 9, 2020, 9:55pm

Thus spake Flint1b:

You haven’t used reflection for this have you? It’s ok to look at the
method names and deduce the property names based on JavaBean convention
(getX setX isX etc) but most other kinds of reflection are evil.

And either way, can you dig that out if you find the time?

I will, I just haven’t gotten to it yet.

–
J.

uckelman · July 19, 2020, 5:56pm

Thus spake Flint1b:

I have made a rough proposal now under PR #75
github.com/vassalengine/vassal/pull/75[1], can someone, maybe
Brent, have a look at this and give me some feedback? This downcasting
sure looks unsafe but if we are 100% sure that only objects of the
proper type ever pass through this code then casting like this is ok.

Of possible relevance here is what’s in the VASSAL.property package.
It contains some machinery for getting and setting properties in a
type-safe way which I added for 3.2 as a start for cleaning up property
handling, and then it never got used.

The two interfaces PropertyExporter extends, PropertyNameSource and
PropertySource, are from just before I started introducing generics,
and are really unfortunate in that they keys and values are Objects
despite that we do know their types…

The VASSAL.property package was intended to address this. There’s
a lot of simplification to be had if it, or something like it were
to be used (and everything converted to use it). There’s also support
for separating persistent state from ephemeral state, which could be
some help toward effecting model-view separation.

Also:

“uckelman” wrote:

What I had was a top-level piece wrapper containing a HashMap from
property names to getters and setters for the properties. What this
does is change getting or setting a property from being O(n) to O(1),
which should make a massive difference with pieces that have a long
list of traits. (It might be the single most significant optimization
we could do for a lot of modules.)

I still have it somewhere. I should dig that out.

You haven’t used reflection for this have you? It’s ok to look at the
method names and deduce the property names based on JavaBean convention
(getX setX isX etc) but most other kinds of reflection are evil.

And either way, can you dig that out if you find the time?

I had a look around and turned up a few of my benchmark cases, but not
the code I had in mind. No matter, as I can still describe it:

Every trait class has a pair of getProperty() and setProperty() methods, which are more or less similar structurally. Here are the ones from Decorator:

@Override
public Object getProperty(Object key) {
if (Properties.KEY_COMMANDS.equals(key)) {
return getKeyCommands();
}
else if (Properties.INNER.equals(key)) {
return piece;
}
else if (Properties.OUTER.equals(key)) {
return dec;
}
else if (Properties.VISIBLE_STATE.equals(key)) {
return myGetState()+piece.getProperty(key);
}
else if (Properties.SELECTED.equals(key)) {
return selected;
}
else {
return piece.getProperty(key);
}
}

@Override
public void setProperty(Object key, Object val) {
if (Properties.INNER.equals(key)) {
setInner((GamePiece) val);
}
else if (Properties.OUTER.equals(key)) {
dec = (Decorator) val;
}
/**
* Cache Selection status and pass it on to all inner traits.
*/
else if (Properties.SELECTED.equals(key)) {
if (val instanceof Boolean) {
setSelected((Boolean) val);
}
else {
setSelected(false);
}
piece.setProperty(key, val);
}
else {
piece.setProperty(key, val);
}
}

These are painfully slow. Whenever you want to do a lookup of a
property, you havae to do a number of string comparisons which is O(n)
in the number of total properties—and not just the total number of
properties in the trait where your property is, but the total number of
properties in the whole stack of traits. This is tollerable when you
have simple pieces, but becomes a disaster when you have pieces with 50
traits (as could be seen in one of Brian’s stack traces a week or two
ago). If you’re unlucky and the property you want is in a trait near the
bottom of the stack, you will have to do hundreds of string comparisons
to get that value. The Gang of Four book should have included a note
that the Decorator pattern was not suitable for stacks of decorators as
deep as we have.

There are at least two ways the above could be improved: Locally, and globally.

Locally, one could put getters and setters for a trait into a map and
have O(1) access to properties within each trait:

protected interface Getter {
public Object get(GamePiece p, Object key);
}

protected interface Setter {
public void set(GamePiece p, Object key, Object value);
}

// This is a map of containing the getting code from old getProperty()
protected static final Map<Object, Getter> getters = Map.ofEntries<>(
entry(Properties.KEY_COMMANDS, (p, k) → p.getKeyCommands()),
entry(Properties.INNER, (p, k) → p.piece),
entry(Properties.OUTER, (p, k) → p.dec),
entry(Properties.VISIBLE_STATE, (p, k) →
p.myGetState() + p.piece.getProperty(k)),
entry(Properties.SELECTED, (p, k) → p.selected)
);

// This is a map of containing the setting code from old setProperty()
protected static final Map<Object, Setter> setters = Map.ofEntries<>(
entry(Properties.INNER, (p, k, v) → p.setInner((GamePiece) v)),
entry(Properties.OUTER, (p, k, v) → p.dec = (Decorator) v),
entry(Properties.SELECTED, (p, k, v) → {
setSelected(v instanceof Boolean ? (Boolean) v : false);
p.piece.setProperty(k, v);
})
);

// … or one could have a Map<Object, Pair<Getter, Setter>> to avoid
// repetition

// These cover the extremal case, when the decorator delegates getting or
// setting to the piece it wraps
protected static final Getter delegatedGetter =
(p, k) → p.piece.getProperty(k);

protected static final Setter delegatedSetter =
(p, k, v) → p.piece.setProperty(k, v);

@Override
public Object getProperty(Object key) {
return getters.getOrDefault(key, delegatedGetter).get(this, key);
}

@Override
public void setProperty(Object key, Object value) {
return setters.getOrDefault(key, delegatedSetter).set(this, key, value);
}

This ought to be quite a bit faster for property lookup. You still have
to traverse the chain of traits, but each step will be faster, since you
can avoid doing so many string comparisons at each step. (If you have a
stack of 20 traits, each with 9 properties and you have to go all the
way to the bottom to retrieve a property, it’s about 2.5x faster. It’s
still O(n), but now n is the number of traits rather than the number of
properties, so n is much smaller.) This also has the virtue of being
invisible from the outside.

The best solution, however, would be to store the outermost getter and
setter for each propery in a map held by the outermost decorator. That
way, there would be completely flat constant-time access to all
properties. This isn’t as straightforward to do as what I described
above, as it’s not static—it depends on trait order, so would have to
be set up for each piece (or, if one wanted to be clever, piece type).
Basically what you’d do at creation time is traverse the trait chain and
grab the first getter/setter you find for each property as you walk
toward the end. It may be possible to manage this without breaking the
API as well, though doing so could be rather fiddly. I know of nothing
which would improve performance across all modules more than this
change.

–
J.

Cattlesquat · July 19, 2020, 6:41pm

Wow it sounds like just what the doctor ordered…

Flint1b · July 19, 2020, 9:50pm

This sounds good in theory, but in practice, so far we have only determined the Piececloner.clone() method to be the one big performance hog. Once we remove most calls to it, we can measure again and go from there.

Remember one of the better things Donald Knuth wrote, “premature optimization is the root of all evil”, or the full quote:

Some smartass professor at the uni has actually taught us this:

for (int i = 0; i < getSize(); i++) {
  // do something
}

… is better written like this, cause it only calls getSize() once:

int size = getSize();
for (int i = 0; i < size; i++) {
  // do something
}

This is a) premature optimization and b) modern hotspot VMs optimize this on their own if they see the need for it

uckelman · July 19, 2020, 9:58pm

Thus spake Flint1b:

This sounds good in theory, but in practice, so far we have only
determined the Piececloner.clone() method to be the one big performance
hog. Once we remove most calls to it, we can measure again and go from
there.

I don’t know what else I can say other than that my claims are based on
having profiled this, in 2010 or so.

If you want the most egregious example I know of, try the VG Vietnam
1965-1975 module and check yourself.

–
J.

Brent_Easton · July 19, 2020, 10:57pm

Yes, it will very much depend on the trait dept, number of GamePieces on the Map, plus the rate they are being hit by GKC’s. Try also the new scenario load and card play for Paths of Glory. (Sorry, Brian, but it runs like a dog).

Cattlesquat · July 20, 2020, 12:37pm

Haha yes for sure, but an ancient dog loved by it’s community!

uckelman · July 30, 2020, 10:16pm

I tried the Stalingrad '42 module both before and after the PieceCloner changes, and didn’t see a noticeable difference.

I tried a bunch of actions on peices for a minute or so; some, like Rail Movement, were catastrophically slow. VisualVM’s profiler isn’t showing me any time taken by PieceCloner.clonePiece(), but I’m getting 13% in Zone.getMutablePropertiesContaineId(), 10% in ChangePropertyCommandEncoder.encode(), almost 10% in DynamicProperty$1.buildChildConfigurer(), 9% in Zone.getMap().

Flint1b · July 30, 2020, 10:51pm

Exactly, this is why I don’t trust this profiler. There wouldn’t be any companies that earn money only by selling profilers if the VisualVM profiler was any good.

I just took a good old stopwatch and measured the following in Vassal 3.3.2 and PR 75:

start stalingrad 42
in wizard select “new game”, “campaign game”, wait until that loads
select “solitaire”, click “finish” and start the stopwatch
wait until the map is fully loaded and CPU usage goes down to idle levels
stop stopwatch

I got 1:20min for 3.3.2 versus 0:40 for PR 75.

And the results with “Path of Glory” were similar, I had 2:20min versus 1:00min there.

uckelman · July 30, 2020, 10:55pm

Thus spake Flint1b:

“uckelman” wrote:

VisualVM’s profiler isn’t showing me any time taken by
PieceCloner.clonePiece()

Exactly, this is why I don’t trust this profiler. There wouldn’t be any
companies that earn money only by selling profilers if the VisualVM
profiler was any good.

That doesn’t follow. There are companies selling compilers, and there are
very good free compilers available…

I just took a good old stopwatch and measured the following in Vassal
3.3.2 and PR 75:

start stalingrad 42

in wizard select “new game”, “campaign game”, wait until that loads

select “solitaire”, click “finish” and start the stopwatch

wait until the map is fully loaded and CPU usage goes down to idle
levels

stop stopwatch

I got 1:20min for 3.3.2 versus 0:40 for PR 75.

Ah. We were not timing the same things. I will check again.

–
J.

Flint1b · July 30, 2020, 11:13pm

They have to add things on top to compete with very good free stuff, usually support, a good IDE that is tailorcut to their compiler, features that the free compilers don’t have, licensing.

uckelman · July 30, 2020, 11:15pm

Thus spake Flint1b:

I got 1:20min for 3.3.2 versus 0:40 for PR 75.

I got 0:08.59 for master vs. 0:05.31 for PR 75. Yes, those are seconds,
that’s not a mistake.

As I’m getting times which are about 1/9 of yours, you can see why I didn’t
notice the difference.

–
J.

Flint1b · July 30, 2020, 11:27pm

I know, I know, I need a faster computer. OTOH, this makes me a human performance profiler, maybe I should open up a profiling company and offer my services

uckelman · July 30, 2020, 11:39pm

So, visualvm counted Decorator.getProperty() being called 5 MILLION TIMES while loading the Campaign scenario of Stalingrad '42, and Decorator.getMap() was called another 2m times. That could be largely avoided by having direct property access via the outermost piece. Traversal of the trait stack’s not free.

Flint1b · July 30, 2020, 11:52pm

What does this mean for the overall CPU time, for PR 75, what is a Trait stack, what is a “outermost piece”… I don’t know the internals well enough, I’m afraid this is beyond me.

uckelman · July 31, 2020, 12:09am

Thus spake Flint1b:

What does this mean for the overall CPU time, for PR 75, what is a Trait
stack, what is a “outermost piece”… I don’t know the internals well
enough, I’m afraid this is beyond me.

Each trait class except for BasicPiece wraps another trait class, which
is the “inner” piece. (That’s Decorator.inner, literally.)

So if you have, say, a piece that can has two sides and can be marked
moved, you might have this:

MovementMarkable → Embellishment → BasicPiece

The instance of MM is the “outermost”. It has some properties, but
delegates inward to Embellishment any properties its asked about but
doesn’t have itself, and Embellishment does the same thing. If you
get to the bottom and BasicPiece doesn’t have the property, then nothing
does and a null gets returned.

Ask Brian how many traits he has on some of the pieces in his modules. It’s
more than three.

This is a totally inappropriate design once you start getting traits more
than a few deep. Just do do a simple thing like get the value of a string,
you may have to go through all the layers one at a time, and that can be
horribly slow. This is the main reason why operations on pieces take
ages sometimes.

The solution to this problem is what I was talking about in the post
linked below:

This all orthogonal to PR 75. I’ll see about merging that on Friday.

–
J.