Print
Category: Linux Programming
Hits: 568

I was doing some tests with the mathematical library of Zeus-Framework when I noticed some weird beheviour on my code. I was calculation some huge numbers stored in a 64bit double inside a method implemented in the dynamically loaded library libZeusMath.so. Then I compared the result with a calculation done in my test program, witch is a binary application. The test sometimes failed even when I used a rounding tolerance of 0.0000000001. I coudn't figure out where the problem comes from. More or less by luck I've found a very good advice on the net.

In order to activate the extended precision instead of the double precision on Linux platforms you have to deactivate the line in zeusbase/Config/Platforms/LinuxPlatform.hpp:


#define ENABLE_DOUBLE_PRECISION
//#undef ENABLE_DOUBLE_PRECISION

Per default I prefer the double precision.

The following text I copied for you from the original source http://www.wrcad.com/linux_numerics.txt.


  Resolution of Differences Between X86 Linux/glibc Floating-Point to
        Integer Conversion Behavior Relative to Other Platforms

            S. R. Whiteley, Whiteley Research Inc., 8/28/01
                        This email address is being protected from spambots. You need JavaScript enabled to view it.

                          Summary:

    Under Linux/glibc on Intel X86 (and compatible) platforms, the
    floating point processor is operated by default in extended
    precision mode.  This can lead to observed differences with
    respect to conversions from floating point to integers from
    other processors, and from other operating systems such as
    FreeBSD which operate the X86 floating point processor in double
    precision mode.  In particular, a problem has arisen when
    porting legacy workstation and scientific software to Linux.


  Below is a description of a problem encountered in Red Hat Linux
  6.0 involving floating point numerics.

  The problem arose after porting a large CAD application, which has
  run successfully for many years on Sun, HP, Dec, and FreeBSD
  workstations.  The Linux port had been in existence for some time,
  however a user reported seeing numerical problems, which were
  traced to code illustrated by the following example:

      char *a = "0.3";
      double d = atof(a);
      int i = (int)(1000*d);
      if (i != 300)
          printf("Linux result (299)\n");
      else
          printf("Result is 300\n");

  The "300" result is obtained on the workstations that were at hand
  from the vendors listed above, whereas Linux, and later it was
  learned, Windows, returned an "anomalous" result of 299.

    Note added:
      The Windows result was obtained with the mingw port of gcc,
      which uses MSVCRT.DLL.  The version in use at the time
      returned the "299" result.  The version presently in use
      (mingw runtime 2000-01-22) returns "300".

  It should be stressed that the code fragment above represents bad
  programming practice, as it assumes a particular behavior in the
  floating point to integer conversion which is not guaranteed by
  any standard.  The problem is the buried existence of similar
  assumptions in legacy workstation software that one day will be
  ported to Linux, with not entirely successful results.

  The official specification for C/C++ does not require conditioned
  numerics.  Nevertheless, It seems to have been true that
  conditioning is employed in UNIX workstations from vendors like
  Sun, HP, DEC, and others, to specifically eliminate the type of
  error seen above, providing the intuitive result 1000*(0.3) = 300. 
  Given that many workstation users are physicists or other
  technical professionals without ingrained numerical pedantry, the
  use of technically incorrect code like that illustrated above is
  probably not uncommon.

  The initial response was to file a bug report with the glibc
  project, under the (later found to be incorrect) assumption that
  the underlying common code in the string to floating point
  conversion routines atof(), strtod(), scanf(), etc. was at fault.

    Number:         1595
    Category:       libc
    Synopsis:       sscanf(), atof() conversion to double precision,
                    roundoff error.
    Environment:
    Red Hat Linux 6.0, glibc 2.1.1
    Description:
      Double precision returns from scanf() and atof() (at least) lack
      precision for certain values, causing behavior (i.e. a broken
      program) that doesn't happen on FreeBSD or Solaris.
      e.g., sscanf("0.3", "&lf", &d); (int)(1000*d) is 299.

  The response from the developers contained the following:

    "0.3 cannot be represented as an exact floating point number. 
      There's nothing we can do."

    "What you see is the difference in the implementation of the
    conversion functions.  Some implementation use the complex and
    slow method to ensure that the converted binary number
    afterwards results to the same output value (after appropriate
    rounding).  This is *not* what ISO C and C++ require (it is
    required for LISP etc).  The implementation in glibc is creating
    the correct result, i.e., the number with the smallest error
    from the real result.  This is different.  I will not change
    this since

      a) it is more correct

      b) it is much faster"

  At this point, the matter was shelved for several months, until
  the author was contacted by a software developer from a large EDA
  tool vendor who had encountered similar problems in porting legacy
  code to Linux.  That individual submitted a request to the glibc
  developer's email group, which elicited, amongst several replies,
  the key to the problem (give credit to Ulrich Drepper,
  This email address is being protected from spambots. You need JavaScript enabled to view it.).

  First, the string to double conversion functions are not the
  issue, and the "complex and slow method..." mentioned above is not
  the issue.  It was accidental that the problem was first noticed
  in an expression that used a string conversion function, however
  code like

      double d = 0.3;
      int i = (int)(1000*d);

  exhibits the same behavior.

  Second, the processor rounding mode is not by itself the key. 
  Setting the Linux rounding mode does not change the result of the
  indicated integer conversion in the default X86 FPU mode under
  Linux/glibc.

  The key is in fact the precision mode of the X86 FPU.  The
  "conditioning", or roundoff error handling is actually implemented
  in the X86 FPU as expressed by the rounding mode.  In both Linux
  and FreeBSD, the rounding mode is set by default to "round to
  nearest".  That is, the FPU keeps three hidden bits of precision,
  and after a computation the extra bits are used to set the result
  to the closest representable number.  This is the correct setting
  for general purpose math and is the default on Linux and FreeBSD
  X86 systems.

  The difference between Linux and FreeBSD is that in Linux, the FPU
  is operated by default in "extended precision" mode, where 80-bit
  internal registers are used for floating point numbers.  In
  FreeBSD, the default is to use "double precision" mode, where
  64-bit precision is retained.

    Note added:
        mingw runtime 2000-01-22 for Windows also uses double
        precison mode by default.  Earlier versions apparently used
        extended precision.

  The problem illustrated above arises when using extended
  precision, since the operand is zero-extended from 53 to 64
  mantissa bits.  The round to nearest operation will choose the
  nearest representation of the exact result, which is then
  converted to an integer by truncation.  Since the operand is
  already "exact" to extended precision, the value will not change,
  and truncation to an integer yields the integer 299, in the
  example.

  On the other hand, if double precision is used, the mantissa of
  the operand is correct to full precision, so that the
  round-to-nearest operation favors (in this case) the next larger
  representable number, which is larger than the nearest integer. 
  The result when converting to an integer is in fact the closest
  integer (300).

  When FreeBSD is operated in extended precision mode (using the
  fpsetprec()) function) the results of the example match those of
  default glibc.  Conversely, the example run under Linux/glibc that
  sets the processor to double precision mode will provide the
  "intuitive" result of default FreeBSD and the other platforms.

  The bottom line is this- to port a legacy C/C++ program to Linux,
  one can add the following to the main function:

    #ifdef linux
    #include <fpu_control.h>
    #endif

    main(argc, **argv)
    {
    #ifdef linux
      /*
      This puts the X86 FPU in 64-bit precision mode.  The default
      under Linux is to use 80-bit mode, which produces subtle
      differences from FreeBSD and other systems, eg,
      (int)(1000*atof("0.3")) is 300 in 64-bit mode, 299 in 80-bit
      mode.
      */
      fpu_control_t cw;
      _FPU_GETCW(cw);
      cw &= ~_FPU_EXTENDED;
      cw |= _FPU_DOUBLE;
      _FPU_SETCW(cw);
    #endif

  This code, specific to X86 Linux and gcc, puts the FPU into double
  precision mode for the duration of the program execution.

  Caveat:
    It is unknown if there are subtle effects in glibc math
    operations resulting from this change.  The glibc developers
    discourage the use of this option, recommending instead that the
    source code be modified to provide explicit rounding control
    where needed.  Use of double precision mode should probably be
    employed only when modification of the source code is
    impractical.

  According to Intel, reduced precision is used only in in the FADD,
  FSUB, FMUL, FDIV, and FSQRT instructions.  Otherwise, the extended
  precision is used.  This would seem to imply that double precision
  mode would not affect returns from processor-implemented
  transcendental functions, for example.  However, Bessel functions,
  for example, and other functions that are implemented in software
  may become more susceptible to round-off error, depending upon
  how intermediate results are stored.

  Since in conventional C/C++ all operands are delivered and
  returned as 64-bit (or less precise) quantities, it is not clear
  that the added precision of 80-bit mode is important practically,
  considering that there are already extra precision "hidden" bits
  employed for rounding.  However, it is very likely that someone
  can come up with an example of where the extra precision does make
  an important difference.  On the other hand, assuming that most
  proprietary workstations use 64-bit internal precision, the
  results should be consistent with the workstations.


  Additional Information
  ----------------------

  A good tutorial on issues in floating-point arithmetic can be found
  in 

    David Goldberg, What every computer scientist should know about
    floating-point arithmetic, ACM Computing Surveys 23, vol 1
    (1991-03), pp.  5-48
    <http://www.validgh.com/goldberg/paper.ps>

  Of particular interest are the last few pages of the postscript
  document referenced above (which contain supplemental information
  apparently not in the original paper), where pitfalls of using
  extended precision are discussed.  In particular, one potential
  pitfall is comparison between two naively identical results,
  where at the whim of the compiler one result is computed using
  internal registers only, and the other result has a double
  precision intermediate.  The comparison can "mysteriously" fail.

  The gcc compiler, at least recent versions, supports a "long
  double" data type which, at 12 bytes on X86 systems, represents
  the full precision of the internal registers.  This allows the
  round to nearest mode to work "correctly" in extended precision
  mode, i.e.,

      long double d = 0.3;
      int i = (int)(1000*d)  // i = 300

  However, atof() and other standard math functions return ordinary
  doubles, thus the problem remains for returns from these
  functions.

  In glibc-2.1, there are long double versions of most if not all of
  the math functions, suffixed with an 'l', i.e., sinl, expl, etc. 
  There is no mention of this in the man pages on RH 6.0, but the
  info file for glibc describes these extensions.

  Also in glibc, an "L" modifier in the printf/scanf functions and
  friends denotes "long double" when applied to e/f/g.

  There is no "atofl" function, but there is a strtold function
  which is supposed to return a long double result.  However, in RH
  6.0, this seems to be broken.  The sscanf("%Ld") function appears
  to work correctly.

      // we're using extended precision
      int i;
      char *a = "0.3";
      long double d;
      d = strtold(a, 0);
      printf("%Lg\n", d);  // prints "-1.71799e+09"
      sscanf(a, "%Lf", &d);
      printf("%Lg\n", d);  // prints "0.3"
      i = (int)(1000*d);   // i = 300