Create account / Log in

CPU profiling

Discussion area for the development team.

Moderators: uckelman, Tim M

CPU profiling

Postby Flint1b » June 15th, 2020, 12:20 am

A question about performance benchmarks came up, in case there is an interest in current performance hogs in the code, I did a quick run in a profiler, doing the following steps:

preconditions:
- use Stalingrad 42 module, a module with a fairly big map, a few hundred counters
- use the current master branch with the latest commits
- make sure the tiling process is complete for this module, as that is the common scenario in everyday use

profiling:
- start the Player, pass it the module and -standalone as arguments
- select one of the premade scenarios in the wizard, this should be very similar to opening an existing savegame
- wait until the map fully loads, this takes a long time, good for profiling
- once the map loads up, scroll scroll scroll, zoom zoom zoom, watch how the map tiles appear on the screen one by one, this is also a bit slow at times, which is good for profiling
- move some counters around, move some stacks around, do some right-click actions on some counters, this went rather quick so probably not much impact on CPU time
- close the player (should have saved the game to profile the save code as well, I know, but for a user the time until it loads up and he can start moving the pieces around is probably more important than the performance of the save code)

Results are in the image below, the main performance hogs seem to be:
- VASSAL.counters.PieceCloner.clonePiece() - 17% + 14% of total CPU time (17% in this KeyEvent-recursion, 14% in GlobalKeyCommand/GlobalCommand subtree)
- VASSAL.counters.Decorator.getType() - 8%
- VASSAL.build.module.Map.StackMetrics.draw() - 8%
- VASSAL.build.module.BasicCommandEncoder.createPiece() - 10%
- VASSAL.tools.imageop.ScaleOpTiledBitmapImpl$TileOp.eval() - 10%

I think it is safe to say that PieceCloner.clonePiece() is the big bad guy here, if CPU performance is a concern then all optimization efforts should be aimed at this one method. It might make sense to start by splitting it up into several small methods, as the JVM can optimize those much better. Maybe also remove the recursion and replace it with iteration. The C++ devs among us probably have even more ideas how to improve this code to make it faster.

The image is quite big and the forums won't let me upload it so it's here: https://drive.google.com/file/d/12lXK-y ... sp=sharing
User avatar
Flint1b
 
Posts: 461
Joined: May 19th, 2020, 12:27 am
Location: Colonia Agrippina

Re: CPU profiling

Postby Brent Easton » June 15th, 2020, 1:10 am

I'm not sure investing time in optimizing the PieceCloner is worth it. It's primary purpose is to create new Pieces that are a duplicate of an existing piece or piece definition. It is an inherently expensive process and there is a limit to what you will be able to achieve.

A better approach is to look at why it is being called so often after a game has started.

Piece cloning should only happen relatively rarely, at game start when pieces are created, or when pieces are created manually by players (drag and drop from palette, replace etc.).

The real culprit is this little gem:

Code: Select all
piece.setProperty(Properties.SNAPSHOT, PieceCloner.getInstance().clonePiece(piece));

which is called all over the place to create a backup copy of the current piece before doing something to it.

The only place that this backup is used is in the ReportState trait to report on the previous values of properties maintained by this piece. In early Vassal, this was the only way to save the property values from an earlier instance of a piece.

In 2009(!), we changed Decorator (and BasicPiece) to implement the PropertyNameSource interface which allows every Decorator to expose the names of the properties they maintain. I use this in the Beanshell expression builder to create lists of properties that can be inserted into the current expression.

I expect it will be FAR more efficient to extract the exposed property values into a Map and save that as the SNAPSHOT to use in the ReportState trait, than cloning the piece. You will also need to review the existing Decorator.getPropertyNames() implementations to make sure they are up to date.

I think you will get far more improvement for substantially less work.

Rgds.
User avatar
Brent Easton
 
Posts: 3228
Joined: December 21st, 2007, 3:06 am
Location: Berry, NSW, Australia

Re: CPU profiling

Postby uckelman » June 15th, 2020, 12:07 pm

Thus spake Brent Easton:
>
> In 2009(!), we changed Decorator (and BasicPiece) to implement the
> PropertyNameSource interface which allows every Decorator to expose the
> names of the properties they maintain. I use this in the Beanshell
> expression builder to create lists of properties that can be inserted
> into the current expression.
>
> I expect it will be FAR more efficient to extract the exposed property
> values into a Map and save that as the SNAPSHOT to use in the
> ReportState trait, than cloning the piece. You will also need to review
> the existing Decorator.getPropertyNames() implementations to make sure
> they are up to date.
>
> I think you will get far more improvement for substantially less work.

I did some work on this which never made it to a release. What I had was
a top-level piece wrapper containing a HashMap from property names to
getters and setters for the properties. What this does is change getting
or setting a property from being O(n) to O(1), which should make a
massive difference with pieces that have a long list of traits. (It
might be the single most significant optimization we could do for a
lot of modules.)

I still have it somewhere. I should dig that out.

--
J.
User avatar
uckelman
Site Admin
 
Posts: 9001
Joined: December 10th, 2007, 9:48 am
Location: Durham, England

Re: CPU profiling

Postby Flint1b » June 15th, 2020, 12:39 pm

Brent Easton wrote:I expect it will be FAR more efficient to extract the exposed property values into a Map and save that as the SNAPSHOT to use in the ReportState trait, than cloning the piece. You will also need to review the existing Decorator.getPropertyNames() implementations to make sure they are up to date.


Sounds good! This leads to another question though: At the moment the PieceCloner works on GamePiece-type objects, but GamePiece is not related to PropertyNameSource. All types that implement GamePiece except for the Stack and the Deck also extend or implement PropertyNameSource. For this change the GamePiece would have to extend PropertyNameSource, but how would Stack and Deck implement this interface, whose properties would they return, those of all their contents or just the properties of a single piece inside them or no properties at all?

Another two slightly off-topic questions about these properties, are the property keys always of type String? Are the only possible property keys the ones defined in the Properties interface, and could they be changed into an enum?
User avatar
Flint1b
 
Posts: 461
Joined: May 19th, 2020, 12:27 am
Location: Colonia Agrippina

Re: CPU profiling

Postby Brent Easton » June 15th, 2020, 1:16 pm

For the generation of the SNAPSHOTS for the ReportState, the PieceCloner is only ever called with a Decorator, or potentially a BasicPiece (For a piece consisting of a BasicPiece trait only), both of which implement PropertyNameSource.

Even though the Inner piece of a Decorator is defined as a GamePiece, it can never be anything except a Decorator or a GamePiece.

Sadly, a BasicPiece is not a Decorator, which will lead to some instanceof and casting ugliness somewhere.

There is no need to try and support Stacks or Decks.

Yes, the property names are always Strings. But the Properties interface only contains the property names that are commonly used from components outside a piece. There are many more properties used internally by various traits and some of those names are variable between modules (created by the module designer - Case in point, Dynamic Property names). getPropertyNames() in the PropertyNameSource interface returns those names.
User avatar
Brent Easton
 
Posts: 3228
Joined: December 21st, 2007, 3:06 am
Location: Berry, NSW, Australia

Re: CPU profiling

Postby Flint1b » July 9th, 2020, 9:51 pm

I have made a rough proposal now under PR #75 https://github.com/vassalengine/vassal/pull/75, can someone, maybe Brent, have a look at this and give me some feedback? This downcasting sure looks unsafe but if we are 100% sure that only objects of the proper type ever pass through this code then casting like this is ok.


Also:
uckelman wrote:What I had was a top-level piece wrapper containing a HashMap from property names to getters and setters for the properties. What this does is change getting or setting a property from being O(n) to O(1), which should make a massive difference with pieces that have a long list of traits. (It might be the single most significant optimization we could do for a lot of modules.)

I still have it somewhere. I should dig that out.


You haven't used reflection for this have you? It's ok to look at the method names and deduce the property names based on JavaBean convention (getX setX isX etc) but most other kinds of reflection are evil.

And either way, can you dig that out if you find the time?
User avatar
Flint1b
 
Posts: 461
Joined: May 19th, 2020, 12:27 am
Location: Colonia Agrippina

Re: CPU profiling

Postby uckelman » July 9th, 2020, 9:55 pm

Thus spake Flint1b:
>
> You haven't used reflection for this have you? It's ok to look at the
> method names and deduce the property names based on JavaBean convention
> (getX setX isX etc) but most other kinds of reflection are evil.
>
> And either way, can you dig that out if you find the time?

I will, I just haven't gotten to it yet.

--
J.
User avatar
uckelman
Site Admin
 
Posts: 9001
Joined: December 10th, 2007, 9:48 am
Location: Durham, England

Re: CPU profiling

Postby uckelman » July 19th, 2020, 5:56 pm

Thus spake Flint1b:
> I have made a rough proposal now under PR #75
> https://github.com/vassalengine/vassal/pull/75[1], can someone, maybe
> Brent, have a look at this and give me some feedback? This downcasting
> sure looks unsafe but if we are 100% sure that only objects of the
> proper type ever pass through this code then casting like this is ok.
>

Of possible relevance here is what's in the VASSAL.property package.
It contains some machinery for getting and setting properties in a
type-safe way which I added for 3.2 as a start for cleaning up property
handling, and then it never got used.

The two interfaces PropertyExporter extends, PropertyNameSource and
PropertySource, are from just before I started introducing generics,
and are really unfortunate in that they keys and values are Objects
despite that we do know their types...

The VASSAL.property package was intended to address this. There's
a lot of simplification to be had if it, or something like it were
to be used (and everything converted to use it). There's also support
for separating persistent state from ephemeral state, which could be
some help toward effecting model-view separation.

> Also:
>
> "uckelman" wrote:
> > What I had was a top-level piece wrapper containing a HashMap from
> > property names to getters and setters for the properties. What this
> > does is change getting or setting a property from being O(n) to O(1),
> > which should make a massive difference with pieces that have a long
> > list of traits. (It might be the single most significant optimization
> > we could do for a lot of modules.)
> >
> > I still have it somewhere. I should dig that out.
>
>
> You haven't used reflection for this have you? It's ok to look at the
> method names and deduce the property names based on JavaBean convention
> (getX setX isX etc) but most other kinds of reflection are evil.
>
> And either way, can you dig that out if you find the time?

I had a look around and turned up a few of my benchmark cases, but not
the code I had in mind. No matter, as I can still describe it:

Every trait class has a pair of getProperty() and setProperty() methods, which are more or less similar structurally. Here are the ones from Decorator:

@Override
public Object getProperty(Object key) {
if (Properties.KEY_COMMANDS.equals(key)) {
return getKeyCommands();
}
else if (Properties.INNER.equals(key)) {
return piece;
}
else if (Properties.OUTER.equals(key)) {
return dec;
}
else if (Properties.VISIBLE_STATE.equals(key)) {
return myGetState()+piece.getProperty(key);
}
else if (Properties.SELECTED.equals(key)) {
return selected;
}
else {
return piece.getProperty(key);
}
}

@Override
public void setProperty(Object key, Object val) {
if (Properties.INNER.equals(key)) {
setInner((GamePiece) val);
}
else if (Properties.OUTER.equals(key)) {
dec = (Decorator) val;
}
/**
* Cache Selection status and pass it on to all inner traits.
*/
else if (Properties.SELECTED.equals(key)) {
if (val instanceof Boolean) {
setSelected((Boolean) val);
}
else {
setSelected(false);
}
piece.setProperty(key, val);
}
else {
piece.setProperty(key, val);
}
}

These are painfully slow. Whenever you want to do a lookup of a
property, you havae to do a number of string comparisons which is O(n)
in the number of total properties---and not just the total number of
properties in the trait where your property is, but the total number of
properties in the whole stack of traits. This is tollerable when you
have simple pieces, but becomes a disaster when you have pieces with 50
traits (as could be seen in one of Brian's stack traces a week or two
ago). If you're unlucky and the property you want is in a trait near the
bottom of the stack, you will have to do hundreds of string comparisons
to get that value. The Gang of Four book should have included a note
that the Decorator pattern was not suitable for stacks of decorators as
deep as we have.

There are at least two ways the above could be improved: Locally, and globally.

Locally, one could put getters and setters for a trait into a map and
have O(1) access to properties within each trait:

protected interface Getter {
public Object get(GamePiece p, Object key);
}

protected interface Setter {
public void set(GamePiece p, Object key, Object value);
}

// This is a map of containing the getting code from old getProperty()
protected static final Map getters = Map.ofEntries<>(
entry(Properties.KEY_COMMANDS, (p, k) -> p.getKeyCommands()),
entry(Properties.INNER, (p, k) -> p.piece),
entry(Properties.OUTER, (p, k) -> p.dec),
entry(Properties.VISIBLE_STATE, (p, k) ->
p.myGetState() + p.piece.getProperty(k)),
entry(Properties.SELECTED, (p, k) -> p.selected)
);

// This is a map of containing the setting code from old setProperty()
protected static final Map setters = Map.ofEntries<>(
entry(Properties.INNER, (p, k, v) -> p.setInner((GamePiece) v)),
entry(Properties.OUTER, (p, k, v) -> p.dec = (Decorator) v),
entry(Properties.SELECTED, (p, k, v) -> {
setSelected(v instanceof Boolean ? (Boolean) v : false);
p.piece.setProperty(k, v);
})
);

// ... or one could have a Map> to avoid
// repetition

// These cover the extremal case, when the decorator delegates getting or
// setting to the piece it wraps
protected static final Getter delegatedGetter =
(p, k) -> p.piece.getProperty(k);

protected static final Setter delegatedSetter =
(p, k, v) -> p.piece.setProperty(k, v);

@Override
public Object getProperty(Object key) {
return getters.getOrDefault(key, delegatedGetter).get(this, key);
}

@Override
public void setProperty(Object key, Object value) {
return setters.getOrDefault(key, delegatedSetter).set(this, key, value);
}

This ought to be quite a bit faster for property lookup. You still have
to traverse the chain of traits, but each step will be faster, since you
can avoid doing so many string comparisons at each step. (If you have a
stack of 20 traits, each with 9 properties and you have to go all the
way to the bottom to retrieve a property, it's about 2.5x faster. It's
still O(n), but now n is the number of traits rather than the number of
properties, so n is much smaller.) This also has the virtue of being
invisible from the outside.

The best solution, however, would be to store the outermost getter and
setter for each propery in a map held by the outermost decorator. That
way, there would be completely flat constant-time access to all
properties. This isn't as straightforward to do as what I described
above, as it's not static---it depends on trait order, so would have to
be set up for each piece (or, if one wanted to be clever, piece type).
Basically what you'd do at creation time is traverse the trait chain and
grab the first getter/setter you find for each property as you walk
toward the end. It may be possible to manage this without breaking the
API as well, though doing so could be rather fiddly. I know of nothing
which would improve performance across all modules more than this
change.

--
J.
User avatar
uckelman
Site Admin
 
Posts: 9001
Joined: December 10th, 2007, 9:48 am
Location: Durham, England

Re: CPU profiling

Postby Cattlesquat » July 19th, 2020, 6:41 pm

Wow it sounds like just what the doctor ordered...
User avatar
Cattlesquat
 
Posts: 951
Joined: December 2nd, 2019, 4:57 pm
Location: Baltimore, Maryland, USA

Re: CPU profiling

Postby Flint1b » July 19th, 2020, 9:50 pm

This sounds good in theory, but in practice, so far we have only determined the Piececloner.clone() method to be the one big performance hog. Once we remove most calls to it, we can measure again and go from there.

Remember one of the better things Donald Knuth wrote, "premature optimization is the root of all evil", or the full quote:
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.


Some smartass professor at the uni has actually taught us this:
Code: Select all
for (int i = 0; i < getSize(); i++) {
  // do something
}

... is better written like this, cause it only calls getSize() once:
Code: Select all
int size = getSize();
for (int i = 0; i < size; i++) {
  // do something
}


This is a) premature optimization and b) modern hotspot VMs optimize this on their own if they see the need for it
User avatar
Flint1b
 
Posts: 461
Joined: May 19th, 2020, 12:27 am
Location: Colonia Agrippina

Re: CPU profiling

Postby uckelman » July 19th, 2020, 9:58 pm

Thus spake Flint1b:
> This sounds good in theory, but in practice, so far we have only
> determined the Piececloner.clone() method to be the one big performance
> hog. Once we remove most calls to it, we can measure again and go from
> there.

I don't know what else I can say other than that my claims are based on
having profiled this, in 2010 or so.

If you want the most egregious example I know of, try the VG Vietnam
1965-1975 module and check yourself.

--
J.
User avatar
uckelman
Site Admin
 
Posts: 9001
Joined: December 10th, 2007, 9:48 am
Location: Durham, England

Re: CPU profiling

Postby Brent Easton » July 19th, 2020, 10:57 pm

Yes, it will very much depend on the trait dept, number of GamePieces on the Map, plus the rate they are being hit by GKC's. Try also the new scenario load and card play for Paths of Glory. (Sorry, Brian, but it runs like a dog).
User avatar
Brent Easton
 
Posts: 3228
Joined: December 21st, 2007, 3:06 am
Location: Berry, NSW, Australia

Re: CPU profiling

Postby Cattlesquat » July 20th, 2020, 12:37 pm

Haha yes for sure, but an ancient dog loved by it's community!
User avatar
Cattlesquat
 
Posts: 951
Joined: December 2nd, 2019, 4:57 pm
Location: Baltimore, Maryland, USA

Re: CPU profiling

Postby uckelman » July 30th, 2020, 10:16 pm

I tried the Stalingrad '42 module both before and after the PieceCloner changes, and didn't see a noticeable difference.

I tried a bunch of actions on peices for a minute or so; some, like Rail Movement, were catastrophically slow. VisualVM's profiler isn't showing me any time taken by PieceCloner.clonePiece(), but I'm getting 13% in Zone.getMutablePropertiesContaineId(), 10% in ChangePropertyCommandEncoder.encode(), almost 10% in DynamicProperty$1.buildChildConfigurer(), 9% in Zone.getMap().
User avatar
uckelman
Site Admin
 
Posts: 9001
Joined: December 10th, 2007, 9:48 am
Location: Durham, England

Re: CPU profiling

Postby Flint1b » July 30th, 2020, 10:51 pm

uckelman wrote:VisualVM's profiler isn't showing me any time taken by PieceCloner.clonePiece()


Exactly, this is why I don't trust this profiler. There wouldn't be any companies that earn money only by selling profilers if the VisualVM profiler was any good.

I just took a good old stopwatch and measured the following in Vassal 3.3.2 and PR 75:
- start stalingrad 42
- in wizard select "new game", "campaign game", wait until that loads
- select "solitaire", click "finish" and start the stopwatch
- wait until the map is fully loaded and CPU usage goes down to idle levels
- stop stopwatch

I got 1:20min for 3.3.2 versus 0:40 for PR 75.

And the results with "Path of Glory" were similar, I had 2:20min versus 1:00min there.
User avatar
Flint1b
 
Posts: 461
Joined: May 19th, 2020, 12:27 am
Location: Colonia Agrippina

Next

Return to Developers

Who is online

Users browsing this forum: No registered users and 1 guest