The BIG failure of the floating point computation

The Big failure of the floating point computation .... Mmmm
You are writing some code .. writing a simple function ... and using type Double for your computation.
You are then expecting to get a result from your function that is at least close to the exact value ...
Are you right ??
Let see this from an example.
In my example, I will approximate the value of pi. To do so, I will inscribe a circle into a polygon, starting with an hexagon, and compute the half perimeter of the polygon. this will give me an approximation for pi.
Then I will in a loop doubling the number of sides of that polygon. Therefore, each iteration will give me a better approximation.
I will perform this twice, using the same algorithm and the same equation ... with the only difference that I will write that equation in two different form
Since I don't want to throw at you equations and algorithm without explanation, here the idea:
(I wrote that with Word to make it easier to read)
====================
Simple enough ...
It is important to understand that the two forms of the equation are mathematically always equal for a given value of "t" ... Since it is in fact the exact same equation written in two different way.
Now let put these two equations in code
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
RichTextBox1.Font = New Font("Consolas", 9)
RichTextBox2.Font = New Font("Consolas", 9)
TextBox1.Font = New Font("Consolas", 12)
TextBox1.TextAlign = HorizontalAlignment.Center
TextBox1.Text = "3.14159265358979323846264338327..."
Dim tt As Double
Dim Pi As Double
'===============================================================
'Using First Form of the equation
'===============================================================
'start with an hexagon
tt = 1 / Math.Sqrt(3)
Pi = 6 * tt
PrintPi(Pi, 0, RichTextBox1)
'Now we will double the number of sides of the polygon 25 times
For n = 1 To 25
tt = (Math.Sqrt((tt ^ 2) + 1) - 1) / tt
Pi = 6 * (2 ^ n) * tt
PrintPi(Pi, n, RichTextBox1)
Next
'===============================================================
'Using Second Form of the equation
'===============================================================
'start with an hexagon
tt = 1 / Math.Sqrt(3)
Pi = 6 * tt
PrintPi(Pi, 0, RichTextBox2)
'Now we will double the number of sides of the polygon 25 times
For n = 1 To 25
tt = tt / (Math.Sqrt((tt ^ 2) + 1) + 1)
Pi = 6 * (2 ^ n) * tt
PrintPi(Pi, n, RichTextBox2)
Next
End Sub
Private Sub PrintPi(t As Double, n As Integer, RTB As RichTextBox)
Dim S As String = t.ToString("#.00000000000000")
RTB.AppendText(S & " " & Format((6 * (2 ^ n)), "#,##0").PadLeft(13) & " sides polygon")
Dim i As Integer = 0
While S(i) = TextBox1.Text(i)
i += 1
End While
Dim CS = RTB.GetFirstCharIndexFromLine(RTB.Lines.Count - 1)
RTB.SelectionStart = CS
RTB.SelectionLength = i
RTB.SelectionColor = Color.Red
RTB.AppendText(vbCrLf)
End Sub
End Class
The results:
The text box contains the real value of PI.
The set of results on the left were obtain with the first form of the equation .. on the right with the second form of the equation
The red digits show the digits that are exact for pi.
On the right, where we used the second form of the equation, we see that the result converge nicely toward Pi as the number of sides of the polygon increases.
But on the left, with the first form of the equation, we see the after just a few iterations, the function stop converging and then start diverging from the expected value.
What is wrong ... did I made an error in the first form of the equation?
Well probably not since this first form of the equation is the one you will find in your math book.
So, what up here ??
The problem is this:
What is happening is that at each iteration when using the first form, I subtract 1 from the radical, This subtraction always gives a result smaller than 1. Since the type double has a fixed number of digits on the left of the decimal
point, at each iteration I am loosing precision caused by rounding.
And after only 25 iterations, I have accumulate such a big rounding error that even the digit on the left of the decimal point is wrong.
When using the second form of the equation, I am adding 1 to the radical, therefore the value grows and I get no lost of precision.
So, what should we learn from this ?
Well, ... we should at least remember that when using floating point to compute a formula, even a simple one, as I show here, we should always check the exactitude of the result. There are some functions that a floating point is unable to evaluate.

I manually (yes, manually) did the computations with calc.exe. It has a higher accuracy. PI after 25 iterations is 3.1415926535897934934541990520762.
This means tt = 0.000000015604459512183037864437694609544 compared to 0.0000000138636291675699 computed by the code.
Armin
Manually ...
You did better than Archimedes.
He only got to the 96 sides polygon and gave up. Than he said that PI was 3.1427

Similar Messages

How to take the floating point adder output

I have a piece of code in VHDL here : Floating Point(IP Core) Adder.
I want to take the result_fp to the output signal test. The result_fp is not affecting the test signal.
Line 142 : test <= STD_LOGIC_VECTOR(b_fp);
I have tried with
test <= STD_LOGIC_VECTOR(result_fp);
as well as
test <= result_fp;
Not working.
How can I solve this.
clockIn input is fed with a square wave of 50 ns period.
clk_fp is fed with a square wave of 10 ns period(faster clock).
Timing diagram:

Can I implement advanced control algorithm with floating-point computations in Ni 7831R ?

Hi,
I plan to use a Ni 7831R to control a MIMO nano-positioning stage with servo rate above 20kHz. The control algorithm is based on robust control design and it is much more complex than PID. it also includes floating-point caculations. Can I implement such algorithm with Ni 7831R?
By the way, is there any way to expand the FPGA gates number for Ni 7831R? Suppose I run out of the FPGA gates (1M), can I add more FPGA gates by buying some different hardware?
Thanks
Jingyan
Message Edited by Jingyan on 08-22-2006 01:45 PM

Jingyan,
as long as there is no GPU core implemented on the FPGA these devices only support integer arithmetic. NI's FPGA targets currently don't contain a GPU core so there is no native floating point arithmetic available.
Still there are several options to implement floating point arithmetic on your own or to work around this target specific limitation. Here are some links that might help:
Floating-Point Addition in LabVIEW FPGA
Multiplying, Dividing and Scaling in LabVIEW FPGA
The NI 7831R uses an 1M FPGA. If your application requires more gates the NI 7833R (3M) is a good solution.
I hope that helps,
Jochen Klier
National Instruments Germany

How I get the 68881/68882 floating point emulator?

I already installed LabView 4.01 Full devepment system over a macintosh Power PC 6100/60, it's using Mac Os. 7.5.
I can't run LabView, because there appears a error message that says "LabView is built to run on a Mac at Motorola 680x0 processors. It can be run on a power Macintosh but requires software 68881/68882 floating point emulator".
So my question is how I get the floating point emulator?

The emulator you are looking for will have
to be retreived from Apple. It is a piece of
software that fools programs into believeing
you have harware that is not present in your system. I am no Mac guru so this is all I can tell you.
I will warn you however that you are using a very outdated version of Labview (the latest is 6.0.2).
Kevin Kent

Floating Point Representations on SPARC (64-bit architecture)

Hi Reader,
I got hold of "Numerical Computation Guide -2005" by Sun while looking for Floating Point representations on 64 bit Architectures. It gives me nice illustrations of Single and Double formats and the solution for endianness with
two 32-bit words. But it doesn't tell me how it is for 64-bit SPARC or 64-bit x86.
I might be wrong here, but having all integers and pointers of 64-bit length, do we still need to break the floating point numbers and store them in lower / higher order addresses ??
or is it as simple as having a Double Format consistent in the bit-pattern across all the architectures (Intel, SPARC, IBMpowerPC, AMD) with 1 + 11 + 52 bit pattern.
I have tried hard to get hold of a documentation that explains a 64-bit architecture representation of a Floating Point Number. Any suggestion should be very helpful.
Thanks for reading. Hope you have something useful to write back.
Regards,
Regmee

The representation of floating-point numbers is specified by IEEE standard 754. This standard contains the specifications for single-precision (32-bit), and double-precision (64-bit) floating-point numbers (There is also a quad-precision (128-bit) format as well). OpenSPARC T1 supports both single and double precision numbers, and can support quad-precision numbers through emulation (not in hardware). The fact that this is a 64-bit machine does not affect how the numbers are stored in memory.
The only thing that affects how the numbers are stored in memory is endianness. SPARC architecture is big-endian, while x86 is little-endian. But a double-precision floating-point numer in a SPARC register looks the same as a double-precision floating-point number in an x86 register.
formalGuy

Floating point formats: Java/C/C++, PPC and Intel platforms

Hi everyone
Where can I find out about the various bit formats used for 32 bit floating numbers in Java and C/C++ for both Mac hardware platforms?
I'm developing a Java audio application which needs to convert vast quantities of variable width integer audio samples to canonical float audio format. I've discovered that a floating point divide by the maximum integer value gives the correct answer but takes too much processor time, so I'm trying out bit-twiddling in C via JNI to carve out my own floating point bit patterns. This is very fast, however, I need to take into account the various float formats used on the different platforms so my app can be universal. Can anyone point me to the information?
Thanks in advance.
Bob

I am not sure that Rosetta floating point works the same as PPC floating point. I was using RealBasic (a PPC basic compiler) and moved one of the my compiled applications to a MacBook Pro and floating point comparisons that had been exact on the PPC stopped working under Rosetta. I changed the code to do an approximate comparison (i.e. abs(a -b) < tolerance) and this fixed things.
I reported the problem to the RealBasic people and thought nothing more of it until I fired up Adobe's InDesign and not being used to working with picas, changed the units of measurement to inches. The default letter paper size was suddenly 8.5000500050005 inches instead of the more usual 8.5! This was not a big problem, but it appears that all of InDesign's page math is running into some kind of rounding errors.
The floating point format is almost certainly IEEE, and I cannot imagine Rosetta doing anything other than using native hardware Intel floating point. On the other hand, there is a subtle difference in behavior.
I am posting this here as a follow up, but I am also going to post this as a proper question in the forum. If you have to delete one or the other of these duplicate posts, please zap the reply, not the question.

OS X Mavericks Contacts phone numbers shown as Floating point numbers

After installing Mavericks on MBP, I notice that some phone numbers in contacts are shown as floating point numbers, and are unreadable. I am guessing that these are numbers which were proceeded by "+" previously.
An example is 4.9161E+12, which is for a German contact, and the number would be +49 161...
I can't see any way to change the format, and the floating point display is useless, as the last digits are lost.
Would appreciate any advice on the matter.
Many thanks,
Michael.

Music and pics are one way sync - computer to iphone. The iphone is not a storage device.
Only itunes purchased music can be transfered. With iphone attached to itunes, click file>transfer purchases.
Pics are optimized for viewing on iphone, reducing the quality of the pic on the iphone. Only pics taken with iphone and in the Camera Roll can be imported from iphone. This is done as with any other digital camera. You can e-mail the other pics to yourself one at a time. They will not be of the original quality though.
You really should back up your info/pics/music as hard drives can and do fail.

Inline functions in C, gcc optimization and floating point arithmetic issues

For several days I really have become a fan of Alchemy. But after intensive testing I have found several issues which I'd like to solve but I can't without any help.
So...I'm porting an old game console emulator written by me in ANSI C. The code is working on both gcc and VisualStudio without any modification or crosscompile macros. The only platform code is the audio and video output which is out of scope, because I have ported audio and video witin AS3.
Here are the issues:
1. Inline functions - Having only a single inline function makes the code working incorrectly (although not crashing) even if any optimization is enabled or not (-O0 or O3). My current workarround is converting the inline functions to macros which achieves the same effect. Any ideas why inline functions break the code?
2. Compiler optimizations - well, my project consists of many C files one of which is called flash.c and it contains the main and exported functions. I build the project as follows:
gcc -c flash.c -O0 -o flash.o     //Please note the -O0 option!!!
gcc -c file1.c -O3 -o file1.o
gcc -c file2.c -O3 -o file2.o
... and so on
gcc *.o -swc -O0 -o emu.swc   //Please note the -O0 option again!!!
mxmlc.exe -library-path+=emu.swc --target-player=10.0.0 Emu.as
or file in $( ls *.o ) //Removes the obj files
    do
        rm $file
    done
If I define any option different from -O0 in gcc -c flash.c -O0 -o flash.o the program stops working correctly exactly as in the inline funtions code (but still does not crash or prints any errors in debug). flash has 4 static functions to be exported to AS3 and the main function. Do you know why?
If I define any option different from -O0 in gcc *.o -swc -O0 -o emu.swc the program stops working correctly exactly as above, but if I specify -O1, -O2 or O3 the SWC file gets smaller up to 2x for O3. Why? Is there a method to optimize all the obj files except flash.o because I suspect a similar issue as when compilling it?
3. Flating point issues - this is the worst one. My code is mainly based on integer arithmetic but on 1-2 places it requires flating point arithmetic. One of them is the conversion of 16-bit 44.1 Khz sound buffer to a float buffer with same sample rate but with samples in the range from -1.0 to 1.0.
My code:
void audio_prepare_as()
    uint32 i;
    for(i=0;i<audioSamples;i+=2)
        audiobuffer[i] = (float)snd.buffer[i]/32768;
        audiobuffer[i+1] = (float)snd.buffer[i+1]/32768;
My audio playback is working perfectly. But not if using the above conversion and I have inspected the float numbers - all incorrect and invalid. I tried other code with simple floats - same story. As if alchemy refuses to work with floats. What is wrong? I have another lace whre I must resize the framebuffer and there I have a float involved - same crap. Please help me?
Found the floating point problem: audiobuffer is written to a ByteArray and then used in AS. But C floats are obviously not the same as those in AS3. Now the floating point is resolved.
The optimization issues remain! I really need to speed up my code.
Thank you in advice!

Dear Bernd,
I am still unable to run the optimizations and turn on the inline functions. None of the inline functions contain any stdli function just pure asignments, reads, simple arithmetic and bitwise operations.
In fact, the file containing the main function and those functions for export in AS3 did have memset and memcpy. I tried your suggestion and put the code above the functions calling memset and memcpy. It did not work soe I put the code in a header which is included topmost in each C file. The only system header I use is malloc.h and it is included topmost. In other C file I use pow, sin and log10 from math.h but I removed it and made the same thing:
//shared.h
#ifndef _SHARED_H_
#define _SHARED_H_
#include <malloc.h>
static void * custom_memmove( void * destination, const void * source, unsigned int num ) {
void *result;
__asm__("%0 memmove(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num));
return result;
static void * custom_memcpy ( void * destination, const void * source, unsigned int num ) {
void *result;
__asm__("%0 memcpy(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num));
return result;
static void * custom_memset ( void * ptr, int value, unsigned int num ) {
void *result;
__asm__("%0 memset(%1, %2, %3)\n" : "=r"(result) : "r"(ptr), "r"(value), "r"(num));
return result;
static float custom_pow(float x, int y) {
    float result;
__asm__("%0 pow(%1, %2)\n" : "=r"(result) : "r"(x), "r"(y));
return result;
static double custom_sin(double x) {
    double result;
__asm__("%0 sin(%1)\n" : "=r"(result) : "r"(x));
return result;
static double custom_log10(double x) {
    double result;
__asm__("%0 log10(%1)\n" : "=r"(result) : "r"(x));
return result;
#define memmove custom_memmove
#define memcpy custom_memcpy
#define memset custom_memset
#define pow custom_pow
#define sin custom_sin
#define log10 custom_log10
#include "types.h"
#include "macros.h"
#include "m68k.h"
#include "z80.h"
#include "genesis.h"
#include "vdp.h"
#include "render.h"
#include "mem68k.h"
#include "memz80.h"
#include "membnk.h"
#include "memvdp.h"
#include "system.h"
#include "loadrom.h"
#include "input.h"
#include "io.h"
#include "sound.h"
#include "fm.h"
#include "sn76496.h"
#endif /* _SHARED_H_ */
It still behave the same way as if nothing was changed (works incorrectly - displays jerk which does not move, whereby the image is supposed to move)
As I am porting an emulator (Sega Mega Drive) I use manu arrays of function pointers for implementing the opcodes of the CPU's. Could this be an issue?
I did a workaround for the floating point problem but processing is very slow so I hear only bzzt bzzt but this is for now out of scope. The emulator compiled with gcc runs at 300 fps on a 1.3 GHz machine, whereby my non optimized AVM2 code compiled by alchemy produces 14 fps. The pure rendering is super fast and the problem lies in the computational power of AVM. The frame buffer and the enulation are generated in the C code and only the pixels are copied to AS3, where they are plotted in a BitmapData. On 2.0 GHz Dual core I achieved only 21 fps. Goal is 60 fps to have smooth audio and video. But this is offtopic. After all everything works (slow) without optimization, and I would somehow turn it on. Suggestions?
Here is the file with the main function:
#include "shared.h"
#include "AS3.h"
#define FRAMEBUFFER_LENGTH    (320*240*4)
static uint8* framebuffer;
static uint32 audioSamples;
AS3_Val sega_rom(void* self, AS3_Val args)
    int size, offset, i;
    uint8 hardware;
    uint8 country;
    uint8 header[0x200];
    uint8 *ptr;
    AS3_Val length;
    AS3_Val ba;
    AS3_ArrayValue(args, "AS3ValType", &ba);
    country = 0;
    offset = 0;
    length = AS3_GetS(ba, "length");
    size = AS3_IntValue(length);
    ptr = (uint8*)malloc(size);
    AS3_SetS(ba, "position", AS3_Int(0));
    AS3_ByteArray_readBytes(ptr, ba, size);
    //FILE* f = fopen("boris_dump.bin", "wb");
    //fwrite(ptr, size, 1, f);
    //fclose(f);
    if((size / 512) & 1)
        size -= 512;
        offset += 512;
        memcpy(header, ptr, 512);
        for(i = 0; i < (size / 0x4000); i += 1)
            deinterleave_block(ptr + offset + (i * 0x4000));
    memset(cart_rom, 0, 0x400000);
    if(size > 0x400000) size = 0x400000;
    memcpy(cart_rom, ptr + offset, size);
    /* Free allocated file data */
    free(ptr);
    hardware = 0;
    for (i = 0x1f0; i < 0x1ff; i++)
        switch (cart_rom[i]) {
     case 'U':
         hardware |= 4;
         break;
     case 'J':
         hardware |= 1;
         break;
     case 'E':
         hardware |= 8;
         break;
    if (cart_rom[0x1f0] >= '1' && cart_rom[0x1f0] <= '9') {
        hardware = cart_rom[0x1f0] - '0';
    } else if (cart_rom[0x1f0] >= 'A' && cart_rom[0x1f0] <= 'F') {
        hardware = cart_rom[0x1f0] - 'A' + 10;
    if (country) hardware=country; //simple autodetect override
    //From PicoDrive
    if (hardware&8)
        hw=0xc0; vdp_pal=1;
    } // Europe
    else if (hardware&4)
        hw=0x80; vdp_pal=0;
    } // USA
    else if (hardware&2)
        hw=0x40; vdp_pal=1;
    } // Japan PAL
    else if (hardware&1)
        hw=0x00; vdp_pal=0;
    } // Japan NTSC
    else
        hw=0x80; // USA
    if (vdp_pal) {
        vdp_rate = 50;
        lines_per_frame = 312;
    } else {
        vdp_rate = 60;
        lines_per_frame = 262;
    /*SRAM*/
    if(cart_rom[0x1b1] == 'A' && cart_rom[0x1b0] == 'R')
        save_start = cart_rom[0x1b4] << 24 | cart_rom[0x1b5] << 16 |
            cart_rom[0x1b6] << 8 | cart_rom[0x1b7] << 0;
        save_len = cart_rom[0x1b8] << 24 | cart_rom[0x1b9] << 16 |
            cart_rom[0x1ba] << 8 | cart_rom[0x1bb] << 0;
        // Make sure start is even, end is odd, for alignment
        // A ROM that I came across had the start and end bytes of
        // the save ram the same and wouldn't work. Fix this as seen
        // fit, I know it could probably use some work. [PKH]
        if(save_start != save_len)
            if(save_start & 1) --save_start;
            if(!(save_len & 1)) ++save_len;
            save_len -= (save_start - 1);
            saveram = (unsigned char*)malloc(save_len);
            // If save RAM does not overlap main ROM, set it active by default since
            // a few games can't manage to properly switch it on/off.
            if(save_start >= (unsigned)size)
                save_active = 1;
        else
            save_start = save_len = 0;
            saveram = NULL;
    else
        save_start = save_len = 0;
        saveram = NULL;
    return AS3_Int(0);
AS3_Val sega_init(void* self, AS3_Val args)
    system_init();
    audioSamples = (44100 / vdp_rate)*2;
    framebuffer = (uint8*)malloc(FRAMEBUFFER_LENGTH);
    return AS3_Int(vdp_rate);
AS3_Val sega_reset(void* self, AS3_Val args)
    system_reset();
    return AS3_Int(0);
AS3_Val sega_frame(void* self, AS3_Val args)
    uint32 width;
    uint32 height;
    uint32 x, y;
    uint32 di, si, r;
    uint16 p;
    AS3_Val fb_ba;
    AS3_ArrayValue(args, "AS3ValType", &fb_ba);
    system_frame(0);
    AS3_SetS(fb_ba, "position", AS3_Int(0));
    width = (reg[12] & 1) ? 320 : 256;
    height = (reg[1] & 8) ? 240 : 224;
    for(y=0;y<240;y++)
        for(x=0;x<320;x++)
            di = 1280*y + x<<2;
            si = (y << 10) + ((x + bitmap.viewport.x) << 1);
            p = *((uint16*)(bitmap.data + si));
            framebuffer[di + 3] = (uint8)((p & 0x1f) << 3);
            framebuffer[di + 2] = (uint8)(((p >> 5) & 0x1f) << 3);
            framebuffer[di + 1] = (uint8)(((p >> 10) & 0x1f) << 3);
    AS3_ByteArray_writeBytes(fb_ba, framebuffer, FRAMEBUFFER_LENGTH);
    AS3_SetS(fb_ba, "position", AS3_Int(0));
    r = (width << 16) | height;
    return AS3_Int(r);
AS3_Val sega_audio(void* self, AS3_Val args)
    AS3_Val ab_ba;
    AS3_ArrayValue(args, "AS3ValType", &ab_ba);
    AS3_SetS(ab_ba, "position", AS3_Int(0));
    AS3_ByteArray_writeBytes(ab_ba, snd.buffer, audioSamples*sizeof(int16));
    AS3_SetS(ab_ba, "position", AS3_Int(0));
    return AS3_Int(0);
int main()
    AS3_Val romMethod = AS3_Function(NULL, sega_rom);
    AS3_Val initMethod = AS3_Function(NULL, sega_init);
    AS3_Val resetMethod = AS3_Function(NULL, sega_reset);
    AS3_Val frameMethod = AS3_Function(NULL, sega_frame);
    AS3_Val audioMethod = AS3_Function(NULL, sega_audio);
    // construct an object that holds references to the functions
    AS3_Val result = AS3_Object("sega_rom: AS3ValType, sega_init: AS3ValType, sega_reset: AS3ValType, sega_frame: AS3ValType, sega_audio: AS3ValType",
        romMethod, initMethod, resetMethod, frameMethod, audioMethod);
    // Release
    AS3_Release(romMethod);
    AS3_Release(initMethod);
    AS3_Release(resetMethod);
    AS3_Release(frameMethod);
    AS3_Release(audioMethod);
    // notify that we initialized -- THIS DOES NOT RETURN!
    AS3_LibInit(result);
    // should never get here!
    return 0;

Floating point multiplication

hello everybody!
I use OpenSPARC T1. In floating point multiplication the upper 64 bit (64 to 128) where they compute and stored? ...in the fpu or it uses the SPU unit?
thanx in advance

Hi,
According with the OpenSparc T1 micro-architecture specifications (pag 204):
The FPU includes three independent execution pipelines:
Floating-point adder (FPA) adds, subtracts, compares, conversions
Floating-point multiplier (FPM) multiplies
Floating-point divider (FPD) divides
However, keep in mind that all the registers for the floating point operations are kept in the cores.
This is what the specs (pag 31) say about the SPU: "Stream processing unit (SPU) is used for modular arithmetic functions for crypto."

BigDecimal vs floating points...

Hi all,
I know its probably been asked amillion times before but I need to finally fully understand and get my head around the two.
Firstly here are some bits I've been told by different people and read in different places (alot of people seem to think differently which is what confuses me):
- I've read that if you are wanting precision for currency for example that floating point shouldnt be used because of its accuracy down to the fact it cant represent every decimal number.
- The some people have told me that it doesnt matter and theres not much point ,ost the time in BigDecimal all you need to do is correct the floating point with formatting.
- I've asked about this before but people just seem to give me a short answer to it but without actually explaining why or where they get it from, you cant just assume an answer based on nothing...
I'm building some engineering software that has a general accuracy of 3 decmial places (millimeters from meters) and my first thought is that if currency at 2 decimal places requires BigDecimal then I surely require it (I cant afford to be missing off mm for every calculation, theres alot!) but the problem is this has resulted in me building pretty much the whole application with BigDecimal which you can probably imagine brings up thoughts about performance and memory uptake, I do calculations with BigDecimal, store data in BigDecimal and infact the only thing I do in double is the graphical display as the accuracy isnt so important.
My last question is if this is an ok way to build an accurate application it makes me start to wonder why is floating points used more than BigDecimals, surely most numbers are required to be accurate in applications especially of an enterprise scale?
Thanks,
Ken

MarksmanKen wrote:
So your a big user of BigDecimal as well then? Thats good to know someone else thinks in similar ways, I was starting to feel like abit of an idiot for using them so extensively lolNot at all. The idiots are the people who use primitives rather than BigDecimal "because they're faster" even though they've never actually experienced any performance problems. Of course, there are lots of cases where the speed of a primitive is preferable, but on the whole those guys know perfectly well who they are and what they're doing.
My program is very calculation heavy and I've not had any real performance issues yet but I was wondering if the performance gain would be significant enough while keeping the accuracy.Testing will show you the way. Don't let any "we tested this calculation a million times using primitives and the same one using BigDecimal, and it showed a remarkable 3 seconds quicker using primitives" sidetrack you, either. All that matters is that your actual production code is performant enough for your application. Generally speaking, anything involving currency will probably be better using BigDecimal, or, really, a Money class which happens to use BigDecimal under the covers. Quite why enterprise-targeted languages don't have some sort of native Money or Currency class out-of-the-box remains a mystery, to be honest.

Converting Floating Point

I am receiving numbers from a Tinius Olsen Model 290 Universal Testing System. I made a little program to send commands to the machine and receive the responses back as strings. For the decimal value 3.7620 I get the floating point number EDC57040. This is obviously not correct. The correct floating point representation is 4070C5ED, which is accomplished by swapping all the bits. I need to know how to do this in labview, basically go from the string representation of the floating point to the correct swapped floating point.

Here is a simple example:
Be aware that if you have an ancient version of LabVIEW, this option (byte order) is not available and you need to swap the bytes manually. No big deal, really.
Message Edited by altenbach on 05-08-2007 01:15 PM
LabVIEW Champion . Do more with less code and in less time .
Attachments:
LittleEndianSGL.png ‏7 KB

Floating Point # in MIDP & CLDC...

What is way of using the floating point numbers in MIDP & CLDC?
MIDP and CLDC has no built in support for it.
Plz help me...

simple.... Don't :P
you've got fixed point libs available, but theyre not exactly fast.
It is possible to recompile the KVM and enable floating point support, but none of the embedded implementations have this support, so its a pointless excercise.
What do you need floating point for?

Precision loss - conversions between exact values and floating point values

Hi!
I read this in your SQL Reference manual, but I don't quite get it.
Conversions between exact numeric values (TT_TINYINT, TT_SMALLINT, TT_INTEGER, TT_BIGINT, NUMBER) and floating-point values (BINARY_FLOAT, BINARY_DOUBLE) can be inexact because the exact numeric values use decimal precision whereas the floating-point numbers use binary precision.
Could you please give two examples: one where a TT_TINYINT is converted to a BINARY_DOUBLE and one when a TT_BIGINT is converted into a DOUBLE, both cases give examples on lost precision? This would be very helpful.
Thanks!
Sune

chokpa wrote:
Public Example (float... values){}
new Example (1, 1e2, 3.0, 4.754);It accepts it if I just use 1,2,3,4 as the values being passed in, but doesn't like it if I use actual float values.Those are double literals, try
new Example (1f, 1e2f, 3.0f, 4.754f);

Floating-point numbers: min value

Hi,
the number wrapper classes each define a MAX_VALUE and a MIN_VALUE constant. While the MIN_VALUE of non-floating point numbers are negative numbers, the MIN_VALUE of the floating point numbers are the smallest postive numbers.
From the Javadoc:
Float:
MAX_VALUE:
A constant holding the largest positive finite value of type float, (2-2-23)�2127. It is equal to the hexadecimal floating-point literal 0x1.fffffeP+127f and also equal to Float.intBitsToFloat(0x7f7fffff).
MIN_VALUE:
A constant holding the smallest positive nonzero value of type float, 2-149. It is equal to the hexadecimal floating-point literal 0x0.000002P-126f and also equal to Float.intBitsToFloat(0x1).
MAX_VALUE:
A constant holding the largest positive finite value of type double, (2-2-52)�21023. It is equal to the hexadecimal floating-point literal 0x1.fffffffffffffP+1023 and also equal to Double.longBitsToDouble(0x7fefffffffffffffL).
MIN_VALUE:
A constant holding the smallest positive nonzero value of type double, 2-1074. It is equal to the hexadecimal floating-point literal 0x0.0000000000001P-1022 and also equal to Double.longBitsToDouble(0x1L).
Can someone tell me the MAX_NEGATIVE_VALUE (the finite negative value with the largest absolute value) and the MIN_NEGATIVE_VALUE (the negative value with the smallest nonzero absolute value) of float and double (using the xxxBitsToXxx-methods)?
Thanks!
-Puce

The IEEE754 format is symmetric with respect to the sign, so -MAX_VALUE
and -MIN_VALUE are the values you're looking for.
kind regards,
Jos

Designing for floating point error

Hello,
I am stuck with floating point errors and I'm not sure what to do. Specifically, to determine if a point is inside of a triangle, or if it is on the exact edge of the triangle. I use three cross products with the edge as one vector and the other vector is from the edge start to the query point.
The theory says that if the cross product is 0 then the point is directly on the line. If the cross product is <0, then the point is inside the triangle. If >0, then the point is outside the triangle.
To account for the floating point error I was running into, I changed it from =0 to abs(cross_product)<1e-6.
The trouble is, I run into cases where the algorithm is wrong and fails because there is a point which is classified as being on the edge of the triangle which isn't.
I'm not really sure how to handle this.
Thanks,
Eric

So, I changed epsilon from 1e-6 to 1e-10 and it seems to work better (I am using doubles btw). However, that doesn't really solve the problem, it just buries it deeper. I'm interested in how actual commercial applications (such as video games or robots) deal with this issue. Obviously you don't see them giving you an error every time a floating point error messes something up. I think the issue here is that I am using data gathered from physical sensors, meaning the inputs can be arbitrarily close to each other. I am worried though that if I round the inputs, that I will get different data points with the exact same x and y value, and I'm not sure how the geometry algorithms will handle that. Also, I am creating a global navigation mesh of triangles with this data. Floating point errors that are not accounted for correctly lead to triangles inside one another (as opposed to adjacent to each other), which damages the integrity of the entire mesh, as its hard to get your program to fix its own mistake.
FYI:
I am running java 1.6.0_20 in Eclipse Helios with Ubuntu 10.04x64
Here is some code that didn't work using 1e-6 for delta. The test point new Point(-294.18294451166435,-25.496614108304477), is outside the triangle, but because of the delta choice it is seen as on the edge:
class Point
     double x,y;
class Edge
     Point start, end;
class Triangle
     Edge[] edges;
     public Point[] getOrderedPoints() throws Exception{
          Point[] points = new Point[3];
          points[0]=edges[0].getStart();
          points[1]=edges[0].getEnd();
          if (edges[1].getStart().equals(points[0]) || edges[1].getStart().equals(points[1]))
               points[2]=edges[1].getEnd();
          else if (edges[1].getEnd().equals(points[0]) || edges[1].getEnd().equals(points[1]))
               points[2]=edges[1].getStart();
          else
               throw new Exception("MalformedTriangleException\n"+this.print());
          orderNodes(points);
          return points;
        /** Orders node1 node2 and node3 in clockwise order, more specifically
      * node1 is swapped with node2 if doing so will order the nodes clockwise
      * with respect to the other nodes.
      * Does not modify node1, node2, or node3; Modifies only the nodes reference
      * Note: "order" of nodes 1, 2, and 3 is clockwise when the path from point
      * 1 to 2 to 3 back to 1 travels clockwise on the circumcircle of points 1,
      * 2, and 3.
     private void orderNodes(Point[] points){
          //the K component (z axis) of the cross product a x b
          double xProductK = crossProduct(points[0],points[0], points[1], points[2]);
          /*        (3)
           *          +
           *        ^
           *      B
           * (1)+             + (2)
           *       ------A-->
           * Graphical representation of vector A and B. 1, 2, and 3 are not in
           * clockwise order, and the x product of A and B is positive.
          if(xProductK > 0)
               //the cross product is positive so B is oriented as such with
               //respect to A and 1, 2, 3 are not clockwise in order.
               //swapping any 2 points in a triangle changes its "clockwise order"
               Point temp = points[0];
               points[0] = points[1];
               points[1] = temp;
class TriangleTest
     private double delta = 1e-6;
     public static void main(String[] args) {
                Point a = new Point(-294.183483785282, -25.498196740397056);
          Point b = new Point(-294.18345625812026, -25.49859505161433);
          Point c = new Point(-303.88217906116796, -63.04183512930035);
          Edge aa = new Edge (a, b);
          Edge bb = new Edge (c, a);
          Edge cc = new Edge (b, c);
          Triangle aaa = new Triangle(aa, bb, cc);
          Point point = new Point(-294.18294451166435,-25.496614108304477);
          System.out.println(aaa.enclosesPointDetailed(point));
      * Check if a point is inside this triangle
      * @param point The test point
      * @return     1 if the point is inside the triangle, 0 if the point is on a triangle, -1 if the point is not is the triangle
      * @throws MalformedTriangleException
     public int enclosesPointDetailed(LocalPose point, boolean verbose) throws Exception
          Point[] points = getOrderedPoints();
          int cp1 = crossProduct(points[0], points[0], points[1], point);
          int cp2 = crossProduct(points[1], points[1], points[2], point);
          int cp3 = crossProduct(points[2], points[2], points[0], point);
          if (cp1 < 0 && cp2 <0 && cp3 <0)
               return 1;
          else if (cp1 <=0 && cp2 <=0 && cp3 <=0)
               return 0;
          else
               return -1;
         public static int crossProduct(Point start1, Point start2, Point end1, POint end2){
          double crossProduct = (end1.getX()-start1.getX())*(end2.getY()-start2.getY())-(end1.getY()-start1.getY())*(end2.getX()-start2.getX());
          if (crossProduct>floatingPointDelta){
               return 1;
          else if (Math.abs(crossProduct)<floatingPointDelta){
               return 0;
          else{
               return -1;
}

The BIG failure of the floating point computation

Similar Messages

Maybe you are looking for