Floating Point Math

Hi,
I have following code:
float totalSpent;
int intBudget;
float moneyLeft;
totalSpent += Amount;
moneyLeft = intBudget - totalSpent;
And this is how it looks in debugger: http://www.braginski.com/math.tiff
Why would moneyLeft calculated by the code above is .02 different compared to the expression calculated by the debugger?
Expression windows is correct, yet code above produces wrong by .02 result. It only happens for number very large numbers (yet way below int limit)
thanks

Thank you all for help!
Could someone please point me out why first variable printed incorrect, while second is correct:
NSDecimalNumber *intBalance;
NSDecimalNumber *Amount;
NSDecimalNumber *leftAmount;
NSNumberFormatter *currencyStyle;
NSDecimalNumberHandler *handler = [NSDecimalNumberHandler decimalNumberHandlerWithRoundingMode:NSRoundPlain
scale:2 raiseOnExactness:NO raiseOnOverflow:NO
raiseOnUnderflow:NO raiseOnDivideByZero:NO];
currencyStyle = [[NSNumberFormatter alloc] init];
[currencyStyle setFormatterBehavior:NSNumberFormatterBehavior10_4];
[currencyStyle setNumberStyle:NSNumberFormatterCurrencyStyle];
intBalance = [NSDecimalNumber decimalNumberWithString:@"999999"];
Amount = [NSDecimalNumber decimalNumberWithString:@"99999.59"];
leftAmount = [intBalance decimalNumberBySubtracting: Amount withBehavior: handler];
NSLog(@"Number is: %.2f, %@", [leftAmount floatValue], [currencyStyle stringFromNumber:leftAmount]);
Number is: 899999.44, $899,999.41
Message was edited by: leonbrag

Similar Messages

  • So... if there's no floating point math, how do all those calculators work?

    I see there's no floating point math support in J2ME.
    So how do those calculators work? There's lots of 'em for download.
    I have an app that's almost done; all I need to do is finish the calculations, but I need to read Strings from TextArea's and compute values based on them.
    What am I missing?

    You can use third party libraries
    (MathFP http://www.jscience.net)
    or
    implement them with classical algorithms.
    Carlos Sanchez
    [Intesys]

  • Inconsistent Floating Point Math and NaNs on Windows Laptops?

    All -
    I am seeing some very strange inconsistent floating point calculations on Windows Laptops, and I am wondering if anyone has any ideas. Read on, as (to me!) it's very interesting....
    I have attached a segment of our code, along with some sample output. Looking at the code and the output, it seems like it's totally impossible.
    With supposedly non-NaN and non-infinite double values, I am seeing unrepeatable and inconsistent math - the below example only illustrates one such case where we're seeing this behavior.
    If you look at the code below, you will see that I do things like:
    double rhoYo = ...  // some math
    double rho = ...  // exact same mathStrangely enough, I might get rhoYo = 1.51231231 etc and rho = NaN.
    If I reverse those lines (vertically), then again, rhoYo comes out good and rho comes out NaN; however, this is unpredictable and inconsistent. If I project a source point, get a destination point with NaNs as a result, and project the source again, the second destination point may be just fine. Matter of fact, i can put a loop in the code such as:
          double rho = Double.NaN;
          for( int i = 0; i < 10; i++ )
            rho = my_earthRad * my_F / Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n);
            if( ! Double.isNaN( rho ) )
              break;
            System.out.println("NaN'ed rho");
          }and rho will eventually become non-NaN (random # of iterations)!!
    How's that possible? Our code might be tromping on memory somewhere, but this sure seems crazy to me, especially considering that
    we're only using local variables. Anyone know of floating point errors on Windows Laptops?
    With the exact same codebase, this behavior ONLY happens on Windows Laptops, including brand new Dells, old Dells, IBM, Intel and AMD chips (I've tried several ;-). It does NOT happen on Mac or Linux, including the Linux side of a Linux/Windows dual-boot (where it does happen with the Windows side). Even more strangely, it does NOT happen with Windows desktops. I have tried several 1.5.x JVMs, webstart vs no webstart, etc, to no avail. Always and only on Windows Laptops.
    Please help.... ;-) and thanks in advance.
    Sample code:
    public class Projection
      protected Point2D.Double _project(Point2D.Double srcPt, Point2D.Double dstPt) {
        final double my_degToRad = Math.PI / 180.0;
        final double my_originLon = -95.0;
        final double my_originLonRad = my_originLon * my_degToRad;
        final double my_originLat = 25.0;
        final double my_originLatRad = my_originLat * my_degToRad;;
        final double my_stdLat1 = 25.0;
        final double my_stdLat1Rad = my_stdLat1 * my_degToRad;
        final double my_earthRad = 6371.2;
        final double my_n = Math.sin( my_stdLat1Rad );
        final double my_F = Math.cos( my_stdLat1Rad ) * Math.pow( Math.tan( Math.PI / 4.0 + my_stdLat1Rad / 2.0 ), my_n ) / my_n;
        final double my_rhoZero = my_earthRad * my_F / Math.pow( Math.tan( Math.PI / 4.0 + my_originLatRad / 2.0 ), my_n );
        if ( Double.isNaN( my_n ) || Double.isNaN( my_F ) || Double.isNaN( my_rhoZero )) {
          return new Point2D.Double(Double.NaN, Double.NaN);
        if( Double.isNaN( srcPt.x ) || Double.isNaN( srcPt.y ) )
            System.out.println("======= _project received a srcPt with NaNs. Returning NaN point.");
            Point2D.Double nanPoint = new Point2D.Double();
            nanPoint.x = nanPoint.y = Double.NaN;
            return nanPoint;
        if( Double.isInfinite( srcPt.x ) || Double.isInfinite( srcPt.y ) )
            System.out.println("======= _project received a srcPt with isInfinite. Returning NaN point.");
            Point2D.Double nanPoint = new Point2D.Double();
            nanPoint.x = nanPoint.y = Double.NaN;
            return nanPoint;
        //  Inputs are lon, lat degrees.
        final double lonRad = srcPt.x * my_degToRad;
        final double latRad = srcPt.y * my_degToRad;
        final double theta = my_n * (lonRad - my_originLonRad);
        // One Std lat -- tangential cone.
        final double rhoYo = my_earthRad * my_F / Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n);
        final double rho   = my_earthRad * my_F / Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n);
        // Computes kilometers in lambert space.
        dstPt.x = rho * Math.sin(theta);
        dstPt.y = my_rhoZero - (rho * Math.cos(theta));
        // WANK - Here's the problem!  These values shouldnt be NaN!
        if( Double.isNaN( dstPt.x ) || Double.isNaN( dstPt.y ) )
            System.out.println("======= A _projected dstPt has NaNs. Dumping...vvvvvvvvvvvvvvvvvvvvvvvvvvvv");
            if( Double.isNaN( dstPt.x ) )
                System.out.println("======= dstPt.x is NaN");
            if( Double.isNaN( dstPt.y ) )
                System.out.println("======= dstPt.y is NaN");
            System.out.println("======= my_stdLat1 = " + my_stdLat1 );
            System.out.println("======= my_n = " + my_n );
            System.out.println("======= my_originLonRad = " + my_originLonRad );
            System.out.println("======= my_F = " + my_F );
            System.out.println("======= my_earthRad = " + my_earthRad );
            System.out.println("======= lonRad = " + lonRad );
            System.out.println("======= latRad = " + latRad );
            System.out.println("======= theta = " + theta );
            System.out.println("======= Math.tan(Math.PI/4.0 + latRad/2.0) = " + Math.tan(Math.PI/4.0 + latRad/2.0) );
            System.out.println("======= Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n) = " + Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n) );
            System.out.println("======= rho = " + rho );
            System.out.println("======= rhoYo = " + rhoYo );
            System.out.println("======= Math.sin(theta) = " + Math.sin(theta) );
            System.out.println("======= dstPt.x = " + dstPt.x );
            System.out.println("======= Math.cos(theta) = " + Math.cos(theta) );
            System.out.println("======= my_rhoZero = " + my_rhoZero );
            System.out.println("======= (rhoYo * Math.cos(theta)) = " + (rho * Math.cos(theta)) );
            System.out.println("======= my_rhoZero - (rhoYo * Math.cos(theta)) = " + (my_rhoZero - (rho * Math.cos(theta)) ));
            System.out.println("======= dstPt.y = " + dstPt.y );
            System.out.println("======= A _projected dstPt had NaNs. Done dumping. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^");
        return dstPt;
    }And here's the sample output:
    ======= A _projected dstPt has NaNs. Dumping...vvvvvvvvvvvvvvvvvvvvvvvvvvvv
    ======= dstPt.x is NaN
    ======= dstPt.y is NaN
    ======= my_stdLat1 = 25.0
    ======= my_n = 0.42261826174069944
    ======= my_originLonRad = -1.6580627893946132
    ======= my_F = 2.5946660025799146
    ======= my_earthRad = 6371.2
    ======= lonRad = -2.7564670759053924
    ======= latRad = 0.3730758324037379
    ======= theta = -0.4642057102537187
    ======= Math.tan(Math.PI/4.0 + latRad/2.0) = 1.4652768116539785
    ======= Math.pow(Math.tan(Math.PI/4.0 + latRad/2.0), my_n) = 1.175224090766834
    ======= rho = NaN
    ======= rhoYo = 14066.369269924155
    ======= Math.sin(theta) = -0.44771270676160557
    ======= dstPt.x = NaN
    ======= Math.cos(theta) = 0.8941774612481554
    ======= my_rhoZero = 13663.082491950498
    ======= (rhoYo * Math.cos(theta)) = NaN
    ======= my_rhoZero - (rhoYo * Math.cos(theta)) = NaN
    ======= dstPt.y = NaN
    ======= A _projected dstPt had NaNs. Done dumping. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    HI JSchell (and others?) -
    I have created a simple example attached below, that when run repeatedly, does indeed generate spurious NaNs. I have made it as simple as possible. In the code, I use my own lon/lat binary data file, though I am sure any will do. Let me know if anyone wants that file.
    So the deal is that (with my data at least) this program should never generate NaN results. And if one runs it 25432 (eg: random #) times, it wont, but then the 25433th time, it will create NaNs, etc. ie: inconsistent NaN math results.
    As I said before, I have run this on old and new Dell laptops under Windows XP, using java 1.5_02, 1.5_04 and 1.5_08. The latest run was on a brand new Dell with a Intel Core Duo T2600 processor, running XP. If this is a result of the Pentium bug, one would think that would be fixed by now. I have NOT yet tested on AMD, though I will do that this afternoon.
    Remember, this ONLY happens with Windows Laptops.
    Any ideas anyone? Thanks in advance ;-)
    Here's the code that produces spurious NaNs:
    import java.awt.geom.Point2D;
    import java.io.DataInputStream;
    import java.io.EOFException;
    import java.io.File;
    import java.io.FileInputStream;
    public class FloatingPointTest2 implements Runnable
      private static final int NUM_ITERATIONS = 100000;
      private double _degToRad = Math.PI / 180.0;
      private double _originLon = -95.0;
      private double _originLat = 25.0;
      private double _originLonRad = _originLon * _degToRad;
      private double _originLatRad = _originLat * _degToRad;;
      private double _stdLat1 = 25.0;
      private double _stdLat1Rad = _stdLat1 * _degToRad;
      private double _earthRad = 6371.2;
      private double _n = _n = Math.sin( _stdLat1Rad );
      private double _F = Math.cos( _stdLat1Rad ) * Math.pow( Math.tan( Math.PI / 4.0 + _stdLat1Rad / 2.0 ), _n ) / _n;
      private double _rhoZero = _earthRad * _F / Math.pow( Math.tan( Math.PI / 4.0 + _originLatRad / 2.0 ), _n );
      private Point2D.Double _project( Point2D.Double srcPt, Point2D.Double dstPt )
        if( Double.isNaN( srcPt.x ) || Double.isNaN( srcPt.y ) )
          System.out.println( "FloatingPointTest2: received a NaN srcPt.  Skipping." );
          return new Point2D.Double( Double.NaN, Double.NaN );
        //  Inputs are lon, lat degrees.
        final double lonRad = srcPt.x * _degToRad;
        final double latRad = srcPt.y * _degToRad;
        final double theta = _n * ( lonRad - _originLonRad );
        double rho = _earthRad * _F / Math.pow( Math.tan( Math.PI / 4.0 + latRad / 2.0 ), _n );
        dstPt.x = rho * Math.sin( theta );
        dstPt.y = _rhoZero - ( rho * Math.cos( theta ) );
        return dstPt;
      public void doTest()
        DataInputStream instream = null;
        int thisRunNaNCount = 0;
        Point2D.Double tempPt = new Point2D.Double();
        Point2D.Double dstPt = new Point2D.Double();
        try
          instream = new DataInputStream( new FileInputStream( System.getProperty(
            "user.home" ) + File.separatorChar + "lonLatBinaryData.bin" ) );
          try
            while( true )
              double lon = instream.readDouble();
              double lat = instream.readDouble();
              if( Double.isNaN( lon ) || Double.isNaN( lat ) )
                continue;
              tempPt.x = lon;
              tempPt.y = lat;
              dstPt = _project( tempPt, dstPt );
              if( Double.isNaN( dstPt.x ) || Double.isNaN( dstPt.y ) )
                thisRunNaNCount++;
          catch( EOFException e )
    //        System.out.println( "End of file" );
          if( thisRunNaNCount > 0 )
            System.out.println( "thisRunNaNCount = " + thisRunNaNCount );
          instream.close();
        catch( Exception e )
          e.printStackTrace();
          System.exit( 1 );
      public void run()
        doTest();
      public static void main( String args[] )
        System.out.println( "Executing FloatingPointTest2." );
        for( int i = 0; i < NUM_ITERATIONS; i++ )
          FloatingPointTest2 test = new FloatingPointTest2();
          test.doTest();
    }

  • Stumped on basic problem with floating point math

    I can't figure this out!  It should be sooooo simple. 
    Here is the challenge: 
    I have an incoming time array.  For example:     0, 1, 2, 3, 4, 5, 6, 7, 8, 9
    I want to scale this array by a constant (e.g. multiply by 0.1).  So the resulting array should be:     0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
    Then I want to calculate the difference between each and every subsequent element in the array.  In this example, the difference should 0.1 between every element.  But my comparison fails. 
    See the example below.  As far as I can see, the resulting boolean should always be TRUE.  But its not.
    But if I remove the scaling operation, then it works ok!
    Please help!
    Solved!
    Go to Solution.

    It has been awhile since smercurio has had a contribution to his retirement fund, but once again you have "discovered" that there is no exact binary representation for 0.1.
    I'd use one of the current "almost equals" comparisons described here:
    http://forums.ni.com/t5/LabVIEW/Darin-s-Weakly-Nugget-2-8-11/m-p/1444262
    And vote for this if you haven't already:
    http://forums.ni.com/t5/LabVIEW-Idea-Exchange/quot-Almost-Equal-quot-functions-for-Float-comparisons...

  • Floating Point Arithmatic Error

    Hi,
    I know actionscript represents numbers and double precision
    floating point values. I'm having a problem where double arithmatic
    in actionscript doesn't match the results of the same double
    arithmatic in C++ / C#.
    EXAMPLE:
    In C++ / C#:
    double x, y, x1, y1;
    x = 209.4;
    y = 148.8;
    x1 = 203.0;
    y1 = 145.0;
    double ddx = x - x1;
    double ddy = y - y1;
    RESULT
    ddx: 6.4000000000000057
    ddy: 3.8000000000000114
    In Flash ActionScipt 2:
    var x, y, x1, y1;
    x = 209.4;
    y = 148.8;
    x1 = 203.0;
    y1 = 145.0;
    var ddx = x - x1;
    var ddy = y - y1;
    RESULT
    ddx: 6.39999999999992
    ddy: 3.80000000000024
    After researching Flash / Actionscript "var" stores numerical
    values as doubles ( 8 bytes ) just like doubles are stored in C++ /
    C# ( 8 bytes ). Why would there be a difference between the results
    of ddx and ddy? Are there different implementations of double
    floating point math? If so, Is there a way I can mimic the Flash /
    Actionscript version in C++ / C#?
    Any help would be great!
    Thanks!

    Hmmm, so you're saying the actual binary representation is
    the same but they're just displayed differently?

  • Floating-point addition unit

    Hello All,
    I am very new to NI and I am looking for an example of floating-point addition unit. Basically, I am looking for some example to perform addition of two floating point number (single or double precision). -- just how to create floating-point addition unit using logical gates model (i.e. AND, OR, XOR, etc)
    Thank you!
    Leo
    Message Edited by smuStudent on 12-05-2005 11:51 PM

    Most (if not all) of us, when we want to do floating point math just put the Add function onto the block diagram. I would suggest you google for floating point gates or something similar. You can create subVIs that make up the basic functions such as a full adder and then link them together as needed.
    Attachments:
    Full Adder.JPG ‏9 KB

  • Floating point formats: Java/C/C++, PPC and Intel platforms

    Hi everyone
    Where can I find out about the various bit formats used for 32 bit floating numbers in Java and C/C++ for both Mac hardware platforms?
    I'm developing a Java audio application which needs to convert vast quantities of variable width integer audio samples to canonical float audio format. I've discovered that a floating point divide by the maximum integer value gives the correct answer but takes too much processor time, so I'm trying out bit-twiddling in C via JNI to carve out my own floating point bit patterns. This is very fast, however, I need to take into account the various float formats used on the different platforms so my app can be universal. Can anyone point me to the information?
    Thanks in advance.
    Bob

    I am not sure that Rosetta floating point works the same as PPC floating point. I was using RealBasic (a PPC basic compiler) and moved one of the my compiled applications to a MacBook Pro and floating point comparisons that had been exact on the PPC stopped working under Rosetta. I changed the code to do an approximate comparison (i.e. abs(a -b) < tolerance) and this fixed things.
    I reported the problem to the RealBasic people and thought nothing more of it until I fired up Adobe's InDesign and not being used to working with picas, changed the units of measurement to inches. The default letter paper size was suddenly 8.5000500050005 inches instead of the more usual 8.5! This was not a big problem, but it appears that all of InDesign's page math is running into some kind of rounding errors.
    The floating point format is almost certainly IEEE, and I cannot imagine Rosetta doing anything other than using native hardware Intel floating point. On the other hand, there is a subtle difference in behavior.
    I am posting this here as a follow up, but I am also going to post this as a proper question in the forum. If you have to delete one or the other of these duplicate posts, please zap the reply, not the question.

  • Inline functions in C, gcc optimization and floating point arithmetic issues

    For several days I really have become a fan of Alchemy. But after intensive testing I have found several issues which I'd like to solve but I can't without any help.
    So...I'm porting an old game console emulator written by me in ANSI C. The code is working on both gcc and VisualStudio without any modification or crosscompile macros. The only platform code is the audio and video output which is out of scope, because I have ported audio and video witin AS3.
    Here are the issues:
    1. Inline functions - Having only a single inline function makes the code working incorrectly (although not crashing) even if any optimization is enabled or not (-O0 or O3). My current workarround is converting the inline functions to macros which achieves the same effect. Any ideas why inline functions break the code?
    2. Compiler optimizations - well, my project consists of many C files one of which is called flash.c and it contains the main and exported functions. I build the project as follows:
    gcc -c flash.c -O0 -o flash.o     //Please note the -O0 option!!!
    gcc -c file1.c -O3 -o file1.o
    gcc -c file2.c -O3 -o file2.o
    ... and so on
    gcc *.o -swc -O0 -o emu.swc   //Please note the -O0 option again!!!
    mxmlc.exe -library-path+=emu.swc --target-player=10.0.0 Emu.as
    or file in $( ls *.o ) //Removes the obj files
        do
            rm $file
        done
    If I define any option different from -O0 in gcc -c flash.c -O0 -o flash.o the program stops working correctly exactly as in the inline funtions code (but still does not crash or prints any errors in debug). flash has 4 static functions to be exported to AS3 and the main function. Do you know why?
    If I define any option different from -O0 in gcc *.o -swc -O0 -o emu.swc  the program stops working correctly exactly as above, but if I specify -O1, -O2 or O3 the SWC file gets smaller up to 2x for O3. Why? Is there a method to optimize all the obj files except flash.o because I suspect a similar issue as when compilling it?
    3. Flating point issues - this is the worst one. My code is mainly based on integer arithmetic but on 1-2 places it requires flating point arithmetic. One of them is the conversion of 16-bit 44.1 Khz sound buffer to a float buffer with same sample rate but with samples in the range from -1.0 to 1.0.
    My code:
    void audio_prepare_as()
        uint32 i;
        for(i=0;i<audioSamples;i+=2)
            audiobuffer[i] = (float)snd.buffer[i]/32768;
            audiobuffer[i+1] = (float)snd.buffer[i+1]/32768;
    My audio playback is working perfectly. But not if using the above conversion and I have inspected the float numbers - all incorrect and invalid. I tried other code with simple floats - same story. As if alchemy refuses to work with floats. What is wrong? I have another lace whre I must resize the framebuffer and there I have a float involved - same crap. Please help me?
    Found the floating point problem: audiobuffer is written to a ByteArray and then used in AS. But C floats are obviously not the same as those in AS3. Now the floating point is resolved.
    The optimization issues remain! I really need to speed up my code.
    Thank you in advice!

    Dear Bernd,
    I am still unable to run the optimizations and turn on the inline functions. None of the inline functions contain any stdli function just pure asignments, reads, simple arithmetic and bitwise operations.
    In fact, the file containing the main function and those functions for export in AS3 did have memset and memcpy. I tried your suggestion and put the code above the functions calling memset and memcpy. It did not work soe I put the code in a header which is included topmost in each C file. The only system header I use is malloc.h and it is included topmost. In other C file I use pow, sin and log10 from math.h but I removed it and made the same thing:
    //shared.h
    #ifndef _SHARED_H_
    #define _SHARED_H_
    #include <malloc.h>
    static void * custom_memmove( void * destination, const void * source, unsigned int num ) {
      void *result; 
      __asm__("%0 memmove(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num)); 
      return result; 
    static void * custom_memcpy ( void * destination, const void * source, unsigned int num ) { 
      void *result; 
      __asm__("%0 memcpy(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num)); 
      return result; 
    static void * custom_memset ( void * ptr, int value, unsigned int num ) { 
      void *result; 
      __asm__("%0 memset(%1, %2, %3)\n" : "=r"(result) : "r"(ptr), "r"(value), "r"(num)); 
      return result; 
    static float custom_pow(float x, int y) {
        float result;
      __asm__("%0 pow(%1, %2)\n" : "=r"(result) : "r"(x), "r"(y));
      return result;
    static double custom_sin(double x) {
        double result;
      __asm__("%0 sin(%1)\n" : "=r"(result) : "r"(x));
      return result;
    static double custom_log10(double x) {
        double result;
      __asm__("%0 log10(%1)\n" : "=r"(result) : "r"(x));
      return result;
    #define memmove custom_memmove
    #define memcpy custom_memcpy
    #define memset custom_memset
    #define pow custom_pow
    #define sin custom_sin
    #define log10 custom_log10 
    #include "types.h"
    #include "macros.h"
    #include "m68k.h"
    #include "z80.h"
    #include "genesis.h"
    #include "vdp.h"
    #include "render.h"
    #include "mem68k.h"
    #include "memz80.h"
    #include "membnk.h"
    #include "memvdp.h"
    #include "system.h"
    #include "loadrom.h"
    #include "input.h"
    #include "io.h"
    #include "sound.h"
    #include "fm.h"
    #include "sn76496.h" 
    #endif /* _SHARED_H_ */ 
    It still behave the same way as if nothing was changed (works incorrectly - displays jerk which does not move, whereby the image is supposed to move)
    As I am porting an emulator (Sega Mega Drive) I use manu arrays of function pointers for implementing the opcodes of the CPU's. Could this be an issue?
    I did a workaround for the floating point problem but processing is very slow so I hear only bzzt bzzt but this is for now out of scope. The emulator compiled with gcc runs at 300 fps on a 1.3 GHz machine, whereby my non optimized AVM2 code compiled by alchemy produces 14 fps. The pure rendering is super fast and the problem lies in the computational power of AVM. The frame buffer and the enulation are generated in the C code and only the pixels are copied to AS3, where they are plotted in a BitmapData. On 2.0 GHz Dual core I achieved only 21 fps. Goal is 60 fps to have smooth audio and video. But this is offtopic. After all everything works (slow) without optimization, and I would somehow turn it on. Suggestions?
    Here is the file with the main function:
    #include "shared.h"
    #include "AS3.h"
    #define FRAMEBUFFER_LENGTH    (320*240*4)
    static uint8* framebuffer;
    static uint32  audioSamples;
    AS3_Val sega_rom(void* self, AS3_Val args)
        int size, offset, i;
        uint8 hardware;
        uint8 country;
        uint8 header[0x200];
        uint8 *ptr;
        AS3_Val length;
        AS3_Val ba;
        AS3_ArrayValue(args, "AS3ValType", &ba);
        country = 0;
        offset = 0;
        length = AS3_GetS(ba, "length");
        size = AS3_IntValue(length);
        ptr = (uint8*)malloc(size);
        AS3_SetS(ba, "position", AS3_Int(0));
        AS3_ByteArray_readBytes(ptr, ba, size);
        //FILE* f = fopen("boris_dump.bin", "wb");
        //fwrite(ptr, size, 1, f);
        //fclose(f);
        if((size / 512) & 1)
            size -= 512;
            offset += 512;
            memcpy(header, ptr, 512);
            for(i = 0; i < (size / 0x4000); i += 1)
                deinterleave_block(ptr + offset + (i * 0x4000));
        memset(cart_rom, 0, 0x400000);
        if(size > 0x400000) size = 0x400000;
        memcpy(cart_rom, ptr + offset, size);
        /* Free allocated file data */
        free(ptr);
        hardware = 0;
        for (i = 0x1f0; i < 0x1ff; i++)
            switch (cart_rom[i]) {
         case 'U':
             hardware |= 4;
             break;
         case 'J':
             hardware |= 1;
             break;
         case 'E':
             hardware |= 8;
             break;
        if (cart_rom[0x1f0] >= '1' && cart_rom[0x1f0] <= '9') {
            hardware = cart_rom[0x1f0] - '0';
        } else if (cart_rom[0x1f0] >= 'A' && cart_rom[0x1f0] <= 'F') {
            hardware = cart_rom[0x1f0] - 'A' + 10;
        if (country) hardware=country; //simple autodetect override
        //From PicoDrive
        if (hardware&8)        
            hw=0xc0; vdp_pal=1;
        } // Europe
        else if (hardware&4)    
            hw=0x80; vdp_pal=0;
        } // USA
        else if (hardware&2)    
            hw=0x40; vdp_pal=1;
        } // Japan PAL
        else if (hardware&1)      
            hw=0x00; vdp_pal=0;
        } // Japan NTSC
        else
            hw=0x80; // USA
        if (vdp_pal) {
            vdp_rate = 50;
            lines_per_frame = 312;
        } else {
            vdp_rate = 60;
            lines_per_frame = 262;
        /*SRAM*/   
        if(cart_rom[0x1b1] == 'A' && cart_rom[0x1b0] == 'R')
            save_start = cart_rom[0x1b4] << 24 | cart_rom[0x1b5] << 16 |
                cart_rom[0x1b6] << 8  | cart_rom[0x1b7] << 0;
            save_len = cart_rom[0x1b8] << 24 | cart_rom[0x1b9] << 16 |
                cart_rom[0x1ba] << 8  | cart_rom[0x1bb] << 0;
            // Make sure start is even, end is odd, for alignment
            // A ROM that I came across had the start and end bytes of
            // the save ram the same and wouldn't work.  Fix this as seen
            // fit, I know it could probably use some work. [PKH]
            if(save_start != save_len)
                if(save_start & 1) --save_start;
                if(!(save_len & 1)) ++save_len;
                save_len -= (save_start - 1);
                saveram = (unsigned char*)malloc(save_len);
                // If save RAM does not overlap main ROM, set it active by default since
                // a few games can't manage to properly switch it on/off.
                if(save_start >= (unsigned)size)
                    save_active = 1;
            else
                save_start = save_len = 0;
                saveram = NULL;
        else
            save_start = save_len = 0;
            saveram = NULL;
        return AS3_Int(0);
    AS3_Val sega_init(void* self, AS3_Val args)
        system_init();
        audioSamples = (44100 / vdp_rate)*2;
        framebuffer = (uint8*)malloc(FRAMEBUFFER_LENGTH);
        return AS3_Int(vdp_rate);
    AS3_Val sega_reset(void* self, AS3_Val args)
        system_reset();
        return AS3_Int(0);
    AS3_Val sega_frame(void* self, AS3_Val args)
        uint32 width;
        uint32 height;
        uint32 x, y;
        uint32 di, si, r;
        uint16 p;
        AS3_Val fb_ba;
        AS3_ArrayValue(args, "AS3ValType", &fb_ba);
        system_frame(0);
        AS3_SetS(fb_ba, "position", AS3_Int(0));
        width = (reg[12] & 1) ? 320 : 256;
        height = (reg[1] & 8) ? 240 : 224;
        for(y=0;y<240;y++)
            for(x=0;x<320;x++)
                di = 1280*y + x<<2;
                si = (y << 10) + ((x + bitmap.viewport.x) << 1);
                p = *((uint16*)(bitmap.data + si));
                framebuffer[di + 3] = (uint8)((p & 0x1f) << 3);
                framebuffer[di + 2] = (uint8)(((p >> 5) & 0x1f) << 3);
                framebuffer[di + 1] = (uint8)(((p >> 10) & 0x1f) << 3);
        AS3_ByteArray_writeBytes(fb_ba, framebuffer, FRAMEBUFFER_LENGTH);
        AS3_SetS(fb_ba, "position", AS3_Int(0));
        r = (width << 16) | height;
        return AS3_Int(r);
    AS3_Val sega_audio(void* self, AS3_Val args)
        AS3_Val ab_ba;
        AS3_ArrayValue(args, "AS3ValType", &ab_ba);
        AS3_SetS(ab_ba, "position", AS3_Int(0));
        AS3_ByteArray_writeBytes(ab_ba, snd.buffer, audioSamples*sizeof(int16));
        AS3_SetS(ab_ba, "position", AS3_Int(0));
        return AS3_Int(0);
    int main()
        AS3_Val romMethod = AS3_Function(NULL, sega_rom);
        AS3_Val initMethod = AS3_Function(NULL, sega_init);
        AS3_Val resetMethod = AS3_Function(NULL, sega_reset);
        AS3_Val frameMethod = AS3_Function(NULL, sega_frame);
        AS3_Val audioMethod = AS3_Function(NULL, sega_audio);
        // construct an object that holds references to the functions
        AS3_Val result = AS3_Object("sega_rom: AS3ValType, sega_init: AS3ValType, sega_reset: AS3ValType, sega_frame: AS3ValType, sega_audio: AS3ValType",
            romMethod, initMethod, resetMethod, frameMethod, audioMethod);
        // Release
        AS3_Release(romMethod);
        AS3_Release(initMethod);
        AS3_Release(resetMethod);
        AS3_Release(frameMethod);
        AS3_Release(audioMethod);
        // notify that we initialized -- THIS DOES NOT RETURN!
        AS3_LibInit(result);
        // should never get here!
        return 0;

  • BUG: Large floating point numbers convert to the wrong integer

    Hi,
    When using the conversion "bullets" to convert SGL, DBL and EXT to integers there are some values which convert wrong. One example is the integer 9223370937343148030, which can be represented exactly as a SGL (and thus exactly as DBL and EXT as well). If you convert this to I64 you get 9223370937343148032 instead, even though the correct integer is within the range of an I64. There are many similar cases, all (I've noticed) within the large end of the ranges.
    This has nothing to do with which integers can be represented exactly as a floating point value or not. This is a genuine conversion bug mind you.
    Cheers,
    Steen
    CLA, CTA, CLED & LabVIEW Champion
    Solved!
    Go to Solution.

    Yes, I understand the implications involved, and there definetely is a limit to how many significant digits that can be displayed in the numeric controls and constants today. I think that either this limit should be lifted or a cap should be put onto the configuration page when setting the display format.
    I ran into this problem as I'm developing a new toolset that lets you convert all the numeric formats into any other numeric format, just like the current "conversion bullets". My conversion bullets have outputs for overflow and exact conversion as well, since I need that functionality myself for a Math toolset (GPMath) I'm also developing. Eventually I'll maybe include underflow as well, but for now just those two outputs are available. Example:
    I do of course pay close attention to the binary representation of the numbers to calculate the Exact conversion? output correctly for each conversion variation (there are hundreds of VIs in polymorphic wrappers), but I relied in some cases on the ability of the numeric indicator to show a true number when configured appropriately - that was when I discovered this bug, which I at first mistook for a conversion error in LabVIEW.
    Is there a compliancy issue with EXT?
    While doing this work I've discovered that the EXT format is somewhat misleadingly labelled as "80-bit IEEE compliant" (it says so here), but that statement should be read with some suspicion IMO. The LabVIEW EXT is not simply IEEE 754-1985 compliant anyways, as that format would imply the x87 80-bit extended format. An x87 IEEE 754 extended precision float only has 63-bit fraction and a 1-bit integer part. That 1-bit integer part is implicit in single and double precision IEEE 754 numbers, but it is explicit in x87 extended precision numbers. LabVIEW EXT seems to have an implicit integer part and 64-bit fraction, thus not straight IEEE 754 compliant. Instead I'd say that the LabVIEW EXT is an IEEE 754r extended format, but still a proprietary one that should deserve a bit more detail in the available documentation. Since it's mentioned several places in the LabVIEW documentation that the EXT is platform independent, your suspicion should already be high though. It didn't take me many minutes to verify the apparent format of the EXT in any case, so no real problem here.
    Is there a genuine conversion error from EXT to U64?
    The integer 18446744073709549568 can be represented exactly as EXT using this binary representation (mind you that the numeric indicators won't display the value correctly, but instead show 18446744073709549600):
    EXT-exponent: 0x100000000111110b
    EXT-fraction: 0x1111111111111111111111111111111111111111111111111111000000000000b
    --> Decimal: 18446744073709549568
    The above EXT value converts exactly to U64 using the To Unsigned Quad Integer "bullet". But then let's try to flip the blue bit from 0 to 1 in the fraction part of the EXT, making this value:
    EXT-exponent: 0x100000000111110b
    EXT-fraction: 0x1111111111111111111111111111111111111111111111111111100000000000b
    --> Decimal: 18446744073709550592
    The above EXT value is still within U64 range, but the To Unsigned Quad Integer "bullet" converts it to U64_max which is 18446744073709551615. Unless I've missed something this must be a genuine conversion error from EXT to U64?
    /Steen
    CLA, CTA, CLED & LabVIEW Champion

  • 128-bit floating point numbers on new AMD quad-core Barcelona?

    There's quite a lot of buzz over at Slashdot about the new AMD quad core chips, announced yesterday:
    http://hardware.slashdot.org/article.pl?sid=07/02/10/0554208
    Much of the excitement is over the "new vector math unit referred to as SSE128", which is integrated into each [?!?] core; Tom Yager, of Infoworld, talks about it here:
    Quad-core Opteron? Nope. Barcelona is the completely redesigned x86, and it’s brilliant
    Now here's my question - does anyone know what the inputs and the outputs of this coprocessor look like? Can it perform arithmetic [or, God forbid, trigonometric] operations [in hardware] on 128-bit quad precision floats? And, if so, will LabVIEW be adding support for it? [Compare here versus here.]
    I found a little bit of marketing-speak blather at AMD about "SSE 128" in this old PDF Powerpoint-ish presentation, from June of 2006:
    http://www.amd.com/us-en/assets/content_type/DownloadableAssets/PhilHesterAMDAnalystDayV2.pdf
    WARNING: PDF DOCUMENT
    Page 13: "Dual 128-bit SSE dataflow, Dual 128-bit loads per cycle"
    Page 14: "128-bit SSE and 128-bit Loads, 128b FADD, 128 bit FMUL, 128b SSE, 128b SSE"
    etc etc etc
    While it's largely just gibberish to me, "FADD" looks like what might be a "floating point adder", and "FMUL" could be a "floating point multiplier", and God forbid that the two "SSE" units might be capable of computing some 128-bit cosines. But I don't know whether that old paper is even applicable to the chip that was released yesterday, and I'm just guessing as to what these things might mean anyway.
    Other than that, though, AMD's main website is strangely quiet about the Barcelona announcement. [Memo to AMD marketing - if you've just released the greatest thing since sliced bread, then you need to publicize the fact that you've just released the greatest thing since sliced bread...]

    I posted a query over at the AMD forums, and here's what I was told.
    I had hoped that e.g. "128b FADD" would be able to do something like the following:
    /* "quad" is a hypothetical 128-bit quad precision  */
    /* floating point number, similar to "long double"  */
    /* in recent versions of C++:                       */
    quad x, y, z;
    x = 1.000000000000000000000000000001;
    y = 1.000000000000000000000000000001;
    /* the hope was that "128b FADD" could perform the  */
    /* following 128-bit addition in hardware:          */
    z = x + y;
    However, the answer I'm getting is that "128b FADD" is just a set of two 64-bit adders running in parallel, which are capable of adding two vectors of 64-bit doubles more or less simultaneously:
    double x[2], y[2], z[2];
    x[0] = 1.000000000000000000000000000001;
    y[0] = 1.000000000000000000000000000001;
    x[1] = 2.000000000000000000000000000222;
    y[1] = 2.000000000000000000000000000222;
    /* Apparently the coordinates of the two "vectors" x & y       */
    /* can be sent to "128b FADD" in parallel, and the following   */
    /* two summations can be computed more or less simultaneously: */
    z[0] = x[0] + y[0];
    z[1] = x[1] + y[1];
    Thus e.g. "128b FADD", working in concert with "128b FMUL", will be able to [more or less] halve the amount of time it takes to compute a dot product of vectors whose coordinates are 64-bit doubles.
    So this "128-bit" circuitry is great if you're doing lots of linear algebra with 64-bit doubles, but it doesn't appear to offer anything in the way of greater precision for people who are interested in precision-sensitive calculations.
    By the way, if you're at all interested in questions of precision sensitivity & round-off error, I'd highly recommend Prof Kahan's page at Cal-Berzerkeley:
    http://www.cs.berkeley.edu/~wkahan/
    PDF DOCUMENT: How JAVA's Floating-Point Hurts Everyone Everywhere
    http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
    PDF DOCUMENT: Matlab's Loss is Nobody's Gain
    http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf

  • Single Precision Floating Point Numbers to Bytes

    Ok here is some code that i have written w hile back with some help from the support staff. It is designed to take in precision floating point numbers that are stored as 4 bytes and convert then to a decimal value. It works off of a udp input string and then also reformats the string. I have the ability to look at up to 4000 parameters from this one udp string. But now what i want to do is do the opposite of what i have written, and also perhaps get rid of the matlab i used in it as well. What i would like to be able to do is input a decimal value and then have it converted in to the 4 byte groupings that make up this decimal nd then have it inputed back in to a single long string witht hat grouping of bytes in the right order. A better explanation of what was done can be found on this website
    http://www.jefflewis.net/XPlaneUDP_8.html
    as the original code followed the "Single Precision Floating Point Numbers and Bytes" example on that site but what i want to do is "Going from Single Precision Floating Point Numbers to Bytes". The site also explains the udp string that is being represented. Also attached is the original code that i am trying to simply reverse.
    Attachments:
    x-plane_udp_master.vi ‏34 KB

    Perhaps what you are doing is an exercise in the programming of the math conversion of the bytes.
    But if you are just interested in getting the conversion done, why not use the typecast function?
    If the bytes happen to be in the wrong order for wherever you need to send the string, then you can use string functions to rearrange them.
    Message Edited by Ravens Fan on 10-02-2007 08:50 PM
    Attachments:
    Example_BD.png ‏3 KB

  • Designing for floating point error

    Hello,
    I am stuck with floating point errors and I'm not sure what to do. Specifically, to determine if a point is inside of a triangle, or if it is on the exact edge of the triangle. I use three cross products with the edge as one vector and the other vector is from the edge start to the query point.
    The theory says that if the cross product is 0 then the point is directly on the line. If the cross product is <0, then the point is inside the triangle. If >0, then the point is outside the triangle.
    To account for the floating point error I was running into, I changed it from =0 to abs(cross_product)<1e-6.
    The trouble is, I run into cases where the algorithm is wrong and fails because there is a point which is classified as being on the edge of the triangle which isn't.
    I'm not really sure how to handle this.
    Thanks,
    Eric

    So, I changed epsilon from 1e-6 to 1e-10 and it seems to work better (I am using doubles btw). However, that doesn't really solve the problem, it just buries it deeper. I'm interested in how actual commercial applications (such as video games or robots) deal with this issue. Obviously you don't see them giving you an error every time a floating point error messes something up. I think the issue here is that I am using data gathered from physical sensors, meaning the inputs can be arbitrarily close to each other. I am worried though that if I round the inputs, that I will get different data points with the exact same x and y value, and I'm not sure how the geometry algorithms will handle that. Also, I am creating a global navigation mesh of triangles with this data. Floating point errors that are not accounted for correctly lead to triangles inside one another (as opposed to adjacent to each other), which damages the integrity of the entire mesh, as its hard to get your program to fix its own mistake.
    FYI:
    I am running java 1.6.0_20 in Eclipse Helios with Ubuntu 10.04x64
    Here is some code that didn't work using 1e-6 for delta. The test point new Point(-294.18294451166435,-25.496614108304477), is outside the triangle, but because of the delta choice it is seen as on the edge:
    class Point
         double x,y;
    class Edge
         Point start, end;
    class Triangle
         Edge[] edges;
         public Point[] getOrderedPoints() throws Exception{
              Point[] points = new Point[3];
              points[0]=edges[0].getStart();
              points[1]=edges[0].getEnd();
              if (edges[1].getStart().equals(points[0]) || edges[1].getStart().equals(points[1]))
                   points[2]=edges[1].getEnd();
              else if (edges[1].getEnd().equals(points[0]) || edges[1].getEnd().equals(points[1]))
                   points[2]=edges[1].getStart();
              else
                   throw new Exception("MalformedTriangleException\n"+this.print());
              orderNodes(points);
              return points;
            /** Orders node1 node2 and node3 in clockwise order, more specifically
          * node1 is swapped with node2 if doing so will order the nodes clockwise
          * with respect to the other nodes.
          * Does not modify node1, node2, or node3; Modifies only the nodes reference
          * Note: "order" of nodes 1, 2, and 3 is clockwise when the path from point
          * 1 to 2 to 3 back to 1 travels clockwise on the circumcircle of points 1,
          * 2, and 3.
         private void orderNodes(Point[] points){
              //the K component (z axis) of the cross product a x b
              double xProductK = crossProduct(points[0],points[0], points[1], points[2]);
              /*        (3)
               *          +
               *        ^
               *      B
               * (1)+             + (2)
               *       ------A-->
               * Graphical representation of vector A and B. 1, 2, and 3 are not in
               * clockwise order, and the x product of A and B is positive.
              if(xProductK > 0)
                   //the cross product is positive so B is oriented as such with
                   //respect to A and 1, 2, 3 are not clockwise in order.
                   //swapping any 2 points in a triangle changes its "clockwise order"
                   Point temp = points[0];
                   points[0] = points[1];
                   points[1] = temp;
    class TriangleTest
         private double delta = 1e-6;
         public static void main(String[] args)  {
                    Point a = new Point(-294.183483785282, -25.498196740397056);
              Point b = new Point(-294.18345625812026, -25.49859505161433);
              Point c = new Point(-303.88217906116796, -63.04183512930035);
              Edge aa = new Edge (a, b);
              Edge bb = new Edge (c, a);
              Edge cc = new Edge (b, c);
              Triangle aaa = new Triangle(aa, bb, cc);
              Point point = new Point(-294.18294451166435,-25.496614108304477);
              System.out.println(aaa.enclosesPointDetailed(point));
          * Check if a point is inside this triangle
          * @param point The test point
          * @return     1 if the point is inside the triangle, 0 if the point is on a triangle, -1 if the point is not is the triangle
          * @throws MalformedTriangleException
         public int enclosesPointDetailed(LocalPose point, boolean verbose) throws Exception
              Point[] points = getOrderedPoints();          
              int cp1 = crossProduct(points[0], points[0], points[1], point);
              int cp2 = crossProduct(points[1], points[1], points[2], point);
              int cp3 = crossProduct(points[2], points[2], points[0], point);
              if (cp1 < 0 && cp2 <0  && cp3 <0)
                   return 1;
              else if (cp1 <=0 && cp2 <=0  && cp3 <=0)
                   return 0;
              else
                   return -1;
             public static int crossProduct(Point start1, Point start2, Point end1, POint end2){
              double crossProduct = (end1.getX()-start1.getX())*(end2.getY()-start2.getY())-(end1.getY()-start1.getY())*(end2.getX()-start2.getX());
              if (crossProduct>floatingPointDelta){
                   return 1;
              else if (Math.abs(crossProduct)<floatingPointDelta){
                   return 0;
              else{
                   return -1;
    }

  • The BIG failure of the floating point computation

     The Big failure of the floating point computation .... Mmmm
    You are writing some code .. writing a simple function ... and using type Double for your computation. 
    You are then expecting to get a result from your function that is at least close to the exact value ... 
    Are you right ??
    Let see this from an example.
    In my example, I will approximate the value of pi. To do so, I will inscribe a circle into a polygon, starting with an hexagon, and compute the half perimeter of the polygon. this will give me an approximation for pi.
    Then I will in a loop doubling the number of sides of that polygon. Therefore, each iteration will give me a better approximation.
    I will perform this twice, using the same algorithm and the same equation ... with the only difference that I will write that equation in two different form 
    Since I don't want to throw at you equations and algorithm without explanation, here the idea:
    (I wrote that with Word to make it easier to read)
    ====================
    Simple enough ... 
    It is important to understand that the two forms of the equation are mathematically always equal for a given value of "t" ... Since it is in fact the exact same equation written in two different way.
    Now let put these two equations in code
    Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    RichTextBox1.Font = New Font("Consolas", 9)
    RichTextBox2.Font = New Font("Consolas", 9)
    TextBox1.Font = New Font("Consolas", 12)
    TextBox1.TextAlign = HorizontalAlignment.Center
    TextBox1.Text = "3.14159265358979323846264338327..."
    Dim tt As Double
    Dim Pi As Double
    '===============================================================
    'Using First Form of the equation
    '===============================================================
    'start with an hexagon
    tt = 1 / Math.Sqrt(3)
    Pi = 6 * tt
    PrintPi(Pi, 0, RichTextBox1)
    'Now we will double the number of sides of the polygon 25 times
    For n = 1 To 25
    tt = (Math.Sqrt((tt ^ 2) + 1) - 1) / tt
    Pi = 6 * (2 ^ n) * tt
    PrintPi(Pi, n, RichTextBox1)
    Next
    '===============================================================
    'Using Second Form of the equation
    '===============================================================
    'start with an hexagon
    tt = 1 / Math.Sqrt(3)
    Pi = 6 * tt
    PrintPi(Pi, 0, RichTextBox2)
    'Now we will double the number of sides of the polygon 25 times
    For n = 1 To 25
    tt = tt / (Math.Sqrt((tt ^ 2) + 1) + 1)
    Pi = 6 * (2 ^ n) * tt
    PrintPi(Pi, n, RichTextBox2)
    Next
    End Sub
    Private Sub PrintPi(t As Double, n As Integer, RTB As RichTextBox)
    Dim S As String = t.ToString("#.00000000000000")
    RTB.AppendText(S & " " & Format((6 * (2 ^ n)), "#,##0").PadLeft(13) & " sides polygon")
    Dim i As Integer = 0
    While S(i) = TextBox1.Text(i)
    i += 1
    End While
    Dim CS = RTB.GetFirstCharIndexFromLine(RTB.Lines.Count - 1)
    RTB.SelectionStart = CS
    RTB.SelectionLength = i
    RTB.SelectionColor = Color.Red
    RTB.AppendText(vbCrLf)
    End Sub
    End Class
    The results:
      The text box contains the real value of PI.
      The set of results on the left were obtain with the first form of the equation .. on the right with the second form of the equation
      The red digits show the digits that are exact for pi.
    On the right, where we used the second form of the equation, we see that the result converge nicely toward Pi as the number of sides of the polygon increases.
    But on the left, with the first form of the equation, we see the after just a few iterations, the function stop converging and then start diverging from the expected value.
    What is wrong ... did I made an error in the first form of the equation?  
    Well probably not since this first form of the equation is the one you will find in your math book.
    So, what up here ??
    The problem is this: 
         What is happening is that at each iteration when using the first form, I subtract 1 from the radical, This subtraction always gives a result smaller than 1. Since the type double has a fixed number of digits on the left of the decimal
    point, at each iteration I am loosing precision caused by rounding.
      And after only 25 iterations, I have accumulate such a big rounding error that even the digit on the left of the decimal point is wrong.
    When using the second form of the equation, I am adding 1 to the radical, therefore the value grows and I get no lost of precision.
    So, what should we learn from this ?
       Well, ... we should at least remember that when using floating point to compute a formula, even a simple one, as I show here, we should always check the exactitude of the result. There are some functions that a floating point is unable to evaluate.

    I manually (yes, manually) did the computations with calc.exe. It has a higher accuracy. PI after 25 iterations is 3.1415926535897934934541990520762.
    This means tt = 0.000000015604459512183037864437694609544  compared to 0.0000000138636291675699 computed by the code.
    Armin
    Manually ... 
      You did better than Archimedes.   
      He only got to the 96 sides polygon and gave up. Than he said that PI was 3.1427

  • Floating point performance

    Hi, I'm trying to investigate floating point performance on an iPhone device vs. using fixed point math. Does the iPhone have a FPU? Are there any papers in the developer iPhone reference library that talk about this subject ( I did a search but nothing seemed to come up ). Anybody have any thoughts / comments about this?
    Any help would be great! Thanks!

    Unfortunately the ARM documentation is lousy and it is hard to estimate what would happen vis-a-vis the thumb compile switch. I suspect apple tested it thoroughly and determined a majority of apps would benefit with it ON. But hey go ahead and try it and please let us know.
    the FPU in the arm is reasonably powerful, but somehow I doubt that the GCC compiler will generate any code worth a snot in terms of taking advantage of it, likely you will have to dip into assembler to use it to advantage.
    I am a long time Microsoft Intel assembler fan - gotta love one of the best macro assemblers even written by anybody. You can get a 4 times increase in your app if you hand code it, but most of us are limited by the operating system and memory vs. pure CPU. I suspect gamers will find the iPhone frustrating as we are all doing Objective-C which is a super high overhead language, although not as bad as Java, which is a language best suited for snails.

  • Floating Point Problem

    Hi,
    I wish to know how can I get the floating point to two significant digit?
    like 0.012234455 to 0.01?

    public class TaDa {
        public static void main(String[] args) {
            double percent = 2 / Math.PI * 100;
            System.out.printf("%.2f", percent);
    }The string starting wth % is a format string. The ".2" means you want 2 decimal
    places. The "f" means you are dealing with a floating point thing.
    There is also String.format("%.2f", percent) which will return a String - the same
    one as is printed in the example. These methods became available with Java
    1.5 and are documented here:
    http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formatter.html
    (A similar printf is available in PHP).

Maybe you are looking for