NoClassDefFoundError for DigestUtils

How does this happen?

vassalengine.org/tracker/sho … i?id=13194

We’ve been getting reports of this problem intermittently as far back as 3.2.2. I thought it was something odd about people’s Java installations, but we’ve also gotten a few of these now since 3.3.0 with Java that we’ve bundled ourselves.

The appearance is that Java’s class loader is failing to find org.apache.commons.codec.digest.DigestUtils, which is strange because that’s in the commons-codec JAR we distribute, and this has to be working for the vast majority of users, otherwise we’d be getting loads of reports rather than one every so often. So… what’s going on here?

People playing modules which roll their own Random implementation compiled under Java 1.2 have whole other issues.

  private static final int[] MAGIC = new int[]{0, -1727483681};
  private static final int MAGIC_FACTOR1 = 1812433253;
  private static final int MAGIC_FACTOR2 = 1664525;
  private static final int MAGIC_FACTOR3 = 1566083941;
  private static final int MAGIC_MASK1 = -1658038656;
  private static final int MAGIC_MASK2 = -272236544;
  private static final int MAGIC_SEED = 19650218;
  private static final long DEFAULT_SEED = 5489L;

It’s magic :smiley:

Most efficient way to solve this is probably to get remote access to the user’s PC and properly install Vassal for them.

And if passing around Java 1.2-compiled custom Random implementations among module developers is a thing, we should definitely put a stop to this. No wonder players are complaining about bad dice results.

Thus spake Flint1b:

Most efficient way to solve this is probably to get remote access to the
user’s PC and properly install Vassal for them.

These are all reports of the same bug:

9635: 3.2.1, Windows 7, Java 7u9, B-17_QotS-v3.9.vmod
9643: 3.2.2, Mac OS X, Java 6u37, Sword&Sail.vmod
9748: 3.2.2, Mac OS X, Java 6u37, Vassal40k 5.4F.vmod
10216: 3.2.6, Mac OS X 10.7.5, Java 6u45, Pandemic_with_On_the_Brink_1.1.vmod
10650: 3.2.8, Mac OS X 10.9, Java 6u65, Sekigahara_1.4.vmod
10670: 3.2.8, Mac OS X 10.6.8, Java 6u51, Ameritrash_v1.0.vmod
10808: 3.2.10, Windows 7, Java 7u45, Carcassonne-2.0.5.vmod
10839: 3.2.10, Mac OS X 10.9.1, Java 6u65, Combat_Commander_Europe.vmod
10874: 3.2.11, Mac OS X 10.9, Java 6u65, WMH_Vassal43.vmod
10941: 3.2.11, Windows 7, Java 6u45, Malifaux.v1.31.vmod
11334: 3.2.13, Linux, Java 7u55, Betrayal_At_House_On_The_Hill_2.0.vmod
11410: 3.2.13, Linux, Java 7u65, WMH_Vassal43.vmod
11553: 3.2.13, Linux, Java 7u65, Carcassonne-hg.1.1.vmod
12506: 3.2.15, Windows 8.1, Java 8u31, Battlestar_Galactica_FFG_2.0.vmod
12606: 3.2.17, Windows 10, Java 8u251, Napoleonic_Wars.vmod
12773: 3.2.15, Mac OS X 10.10.2, Quarriors!_2_0.vmod
12849: 3.2.17, Mac OS X 10.14.6, King_of_Tokyo_2.0.vmod
12905: 3.2.17, Windows 8, Java 6u45, Combat_Commander_Europe.vmod
13160: 3.3.1, Windows 10, Java 14.0.1, Target_for_Today_1_0.vmod
13194: 3.3.2, Windows 10, Java 14.0.1, VIPv341p02.vmod

We have reports from all three major operating systems; from Java 6, 7, 8,
and 14; from several different modules.

I have a hard time believing that these are all due to users having
botched the install.

You can’t think of any way this could happen?

And if passing around Java 1.2-compiled custom Random implementations
among module developers is a thing, we should definitely put a stop to
this. No wonder players are complaining about bad dice results.

I’m sure you’ll find all manner of horrors in module custom code, but I
can’t see any way it’s could be causing the problem at hand.

If that’s a hornet’s nest you’d like to kick, suggest how you’d like to
go about it.


Read this topic online here:
NoClassDefFoundError for DigestUtils - #2 by Flint1b


messages mailing list
messages@vassalengine.org
vassalengine.org/mailman/listinfo/messages


J.

Well since it’s a bug that breaks the AbstractLaunchAction, it would be a show stopper if it happened to many/most users, Vassal would be simply unusable then.

But it seems that this bug appears only for a few users and not all of the time, else there would be a storm of bug reports and users crying “Vassal doesn’t run”.

I would apply modern project management methods to it and prioritize important showstopper bugs to the top, important improvements/refactorings/new features in the middle, and ultra rare bugs like this one all the way to the bottom.

I personally prefer to work at the top or in the middle, if we ever get enough resources we can give this bug to a junior dev or a trainee.

Thus spake Flint1b:

Well since it’s a bug that breaks the AbstractLaunchAction, it would be
a show stopper if it happened to many/most users, Vassal would be simply
unusable then.

That’s why it would be good to fix, becuase it makes VASSAL completely
unusable for some people.

Aren’t you curious why such a reliable system as class loading fails
in this case?


J.

As Star Trek’s Mr. Spock said, the needs of the many outweigh the needs of the few.

How many % of the users do have this bug? And do they have this all the time or just occasionally / rarely?

I wouldn’t support 32bit systems either without first gathering data on how many % need 32bit. But that’s just me, I like Mr. Spock, and I also like it when 98.5% of the population live a much better life if 1.5% is being oppressed in labor camps :smiley:

And curiosity, no I am not curious, I know anything can happen on a user’s system that is not under my control or the control of a professional admin. And most of the time, the error sits in front of the screen :smiley:

Is it possible this is a starting script issue (at least for windows), where it’s not explicitly defining classpaths for all supplied JARs?
Could be getting some knackered old jar from a local install via JAVA_HOME or some such nonsense.

something like lib/Vengine.jar;lib/* in VASSAL.l4j.xml ?

Betrayal version 3.0 works for me,
WMH no idea what that is,
Carcassonne-hg 1.1 works for me,

Linux, Java version 11.0.7, Vassal 3.3.2.

Cannot reproduce → won’t fix. Maybe someone who has access to a Mac and/or Windows can reproduce it, then he might be able to fix it. I can’t even begin fixing a bug that I can’t get hold of.

Thus spake stew-rt:

something like lib/Vengine.jar;lib/* in VASSAL.l4j.xml ?

We’ve had lib\Vengine.jar since 7679589a5 in 2008.


J.

Thus spake Flint1b:

“uckelman” wrote:

11334: 3.2.13, Linux, Java 7u55,
Betrayal_At_House_On_The_Hill_2.0.vmod
11410: 3.2.13, Linux, Java 7u65, WMH_Vassal43.vmod
11553: 3.2.13, Linux, Java 7u65, Carcassonne-hg.1.1.vmod

Betrayal version 3.0 works for me,
WMH no idea what that is,
Carcassonne-hg 1.1 works for me,

Linux, Java version 11.0.7, Vassal 3.3.2.

Cannot reproduce → won’t fix. Maybe someone who has access to a Mac
and/or Windows can reproduce it, then he might be able to fix it. I
can’t even begin fixing a bug that I can’t get hold of.

I can’t reproduce the problem either. That’s why I asked about it,
thinking someone might have seen something like this before, or had
some ideas.

There are the email addresses of five users noted in the bug who could
be asked if they can reproduce it.


J.

It has to be an issue with the local system, else it would be a major showstopper in the Java world. Twitter and Netflix would stop to function, this would lead to riots and revolutions, the world would end if Java’s classloader wouldn’t be reliable and would sometimes not find classes that are there.

I can think of a dozen ways how an incapable user might botch the installation, his whole OS, his hard drive. There’s also such a thing as faulty hardware, broken HD sectors, broken filesystem journal, etc etc.

FIVE users? oh come on… “the needs of the few” :smiley:

Thus spake stew-rt:

Is it possible this is a starting script issue (at least for windows),
where it’s not explicitly defining classpaths for all supplied JARs?
Could be getting some knackered old jar from a local install via
JAVA_HOME or some such nonsense.

We’ve specified just Vengine.jar as the command-line classpath for a
long time—I think that predates the first report of this bug. The
MANIFEST.MF in Vengine.jar explicitly lists all the other JARS on which
it depends, so I don’t see how we’d be picking up any stray bad JARs
this way.


J.

Thus spake Flint1b:

I can think of a dozen ways how an incapable user might botch the
installation, his whole OS, his hard drive. There’s also such a thing as
faulty hardware, broken HD sectors, broken filesystem journal, etc etc.

So you don’t find it suspect that this has happened across so many
different versions of Java, versions of VASSAL, and operating systems?

I can also think of ways this could happen—someone could have manually
deleted the commons-codec JAR, for example—but none of them seem
plausible as causes across so many disparate systems and versions.

FIVE users? oh come on… “the needs of the few” :smiley:

Five users that left us contact information. Users don’t leave contact
information in most reports left via the bug reporter.

Don’t underestimate the importance of addressing bugs which are complete
showstoppers for users. Those people won’t use VASSAL, and some will take
the other players of their games to another system. Some of them will post
for years in places like BoardGameGeek about how they could never get
VASSAL to run. I’ve seen this over and over again.

Failure to start and crashes cause us reputational damage. The direct value
in fixing this type of bug may be small, but the indirect value is large.


J.

Well without being able to reproduce, I don’t know how I can start fixing it. And I think its best for all parties involved if I stay out of 1st level support, having me contact users might lead to even more reputational damage.

“Hello, you reported a bug in Vassal 8 years ago, in the meanwhile Java was upgraded several times over, several new Vassal versions got out, you have probably bought several new computers or reinstalled your OS, I made a kid and got her through 1st grade, and NOW after all this time I am interested in fixing this bug, so hand over remote control of your system to me, stay away from the keyboard and mouse, write a detailed report of what you did 8 years ago, preferably never touch a computer again, and if I ever see you talk bad about Vassal on BGG I will find out where you live, where your wife works and where your children go to school.” :smiley:

No, if we can get detailed reports of how this happens, or even better, steps to reproduce, I can look into fixing this, but right now this is a waste of time and resources, the project is better off with me doing what I do best, moving mountains of code in the proper direction, improving the development process and helping the majority of vassal users.

What we could do - package everything in a “uber-jar”, using the Maven Shade Plugin: maven.apache.org/plugins/maven-shade-plugin/

This would lower the chance of the user messing up the installation by deleting/replacing/breaking single jars.

I have no idea whether this would fix this particular bug though. No way to reproduce, no way to test.

Thus spake Flint1b:

"Hello, you reported a bug in Vassal 8 years ago,

I’ve written that email quite a few times, in fact.

Would someone else like to put their hand up to take this issue and
contact the users for details? (Note that several of the reports are
fairly recent. The point of citing the old ones was to note that they’ve
persisted across time.)


J.

I’m not confidant enough to contact users for remote sessions, but I spent about 2 hours today on my Windows VM trying to break it… and I only managed to produce this output by exclusively deleting (or making unavailable, via permissions) “lib\commons-codec-1.13.jar”, I tried the following:

  • LOTS of dodgy JAVA_HOME and JRE_HOME environment variables. (even tried CLASSPATH directly to an old jar) - and adding a dir with the old JAR to PATH
  • placing the old version(s) of the lib which is missing this class in the “lib” dir
  • placing the old version of the lib into $JAVA_HOME/lib/
  • placing the old version of the lib into $JAVA_HOME/lib/ext/
  • installing Java 1.8 (via the Oracle installer) and trying all the above again
  • adding JAVA_HOME and JRE_HOME registry keys
  • renaming old versions of the jar to the new one (this resulted in a crash, but different, java realised I’d done this, because the metadata didn’t match and refused to load it)

For what it’s worth (as essentially an outsider), I don’t see a disadvantage to the uber-jar using Maven Shade, as I can’t think of any benefits from the supporting libraries being distinct jar files…

Thus spake stew-rt:

I’m not confidant enough to contact users for remote sessions, but I
spent about 2 hours today on my Windows VM trying to break it…

Thanks for taking a look.

I wasn’t expecting anyone to contact users for remote sessions, just
to contact the users and ask them if they’re still having the problem,
and if the problem is repeatable. If so, then do some basic investigation,
like have them check if any JARs are missing.

I only managed to produce this output by exclusively deleting (or making
unavailable, via permissions) “lib\commons-codec-1.13.jar”, I tried the
following:

  • LOTS of dodgy JAVA_HOME and JRE_HOME environment variables. (even
    tried CLASSPATH directly to an old jar) - and adding a dir with the old
    JAR to PATH
  • placing the old version(s) of the lib which is missing this class in
    the "lib" dir
  • placing the old version of the lib into $JAVA_HOME/lib/
  • placing the old version of the lib into $JAVA_HOME/lib/ext/
  • installing Java 1.8 (via the Oracle installer) and trying all the
    above again
  • adding JAVA_HOME and JRE_HOME registry keys
  • renaming old versions of the jar to the new one (this resulted in a
    crash, but different, java realised I’d done this, because the metadata
    didn’t match and refused to load it)

Good things to have checked.

For what it’s worth (as essentially an outsider), I don’t see a
disadvantage to the uber-jar using Maven Shade, as I can’t think of any
benefits from the supporting libraries being distinct jar files…

A disadvantage is that it’s a dependency we don’t have right now, which
solves a problem we may not actually have.


J.

It would only be a build-time dependency, it wouldn’t get into the final product, and we already have various maven plugins, one more won’t hurt. I understand the worry about having too many dependencies, but if maven is already in use, the amount of maven plugins is usually of no concern as long as they help getting the job done.

Having the whole application including all its dependencies in a single .jar is often very convenient, it protects the bundle from being messed with, from single .jars ending up in the wrong place, from the user accidentally deleting one of them etc.

Even in the enterprise backend java world, life became much easier since we were able to package our applications as single .jar files including the embedded application server. The operations teams loved not having to maintain a running tomcat server anymore and have long phone calls with us about whose particular app took down the whole app server.