Floating point arithmetic

Hi everybody,
This line:
System.out.println((0.1+0.7)*10);outputs 7.999999999999999
This is due to how floating point numbers are stored. When writing
a code, sometimes it behaves in an intended way, sometimes it doesn't
(like the one above). Is there a way to "predict" when the code is ok and
when isn't ? Are there any tips to be aware of to get around that kind
of problems ?
Cheers,
Adrian

No. Using BigDecimal just because you don't understand how floating-point numbers work would be... um... short-sighted. And it wouldn't help, either. As soon as you divide 1 by 3 then you have to know how decimal numbers work, which is essentially the same problem.
Edit: I forgot the forum hasn't been automated to provide the mandatory link for people who ask this question. We still have to do it by hand.
http://docs.sun.com/source/806-3568/ncg_goldberg.html
Edited by: DrClap on Oct 11, 2007 3:02 PM

Similar Messages

  • Floating Point arithmetic conversion

    Hi Everyone,
    Can you tell me how to convert a floating point arithmetic field value to a currency field value.
    thanks,
    chan

    Hi,
    I hope simple move statement should work.
    MOVE l_float TO l_curr.
    Make sure that curr field has enough length.
    Thanks,
    Vinod.

  • Port of Giac [Longfloat] Library to HP Prime allowing [Variable Precision] Floating Point Arithmetic

    HP Prime CAS is based on Giac, but [ misses ] some of its Special Purpose Libraries like the Giac [ Longfloat ] Library, which if [ Ported ] would allow HP Prime to be the First ( handheld ) Calculator to provide [ Variable Precision ] Floating Point Arithmetic routines ( fully integrated at its CAS Kernel level ). HP Prime already have internal calls to [ Longfloat ] library, but resulting in [ Error Messages ], like when selecting more than 14 Digits in [ evalf ] Numerical evaluation, as for example: evalf( 1/7, 14 ) producing 0.142857142857 and evalf( 1/7, 15 ) resulting in "Longfloat library not available Error: Bad Argument Value" The same happens when one tries to Extend the [ Digits ] variable to a value greater than 13, like Digits := 50 which returns Digits := 13 as output ( from any specified value higher than 13 ).  The porting of [ Longfloat ] library to HP Prime, would open many New opportunities in [ handheld ] Numerical Computation, usually available only on Top Level Computer Algebra Systems, like Maple, Mathematica or Maxima, and also on Giac/XCas. Its worth mentioning that Any [ Smartphone ] with Xcas/Giac App installed, can fully explore [ Variable Precision ] Floating Point Arithmetic, on current ARM based architectures, which means that a Port of [ Longfloat ] Library from Giac to HP Prime, although requiring some considerable amount of labor, is Not an impossible task. The Benefits of such Longfloat [ Porting ] to a handheld Calculator like HP Prime, would put it several levels Up on the list of Top current Calculator Features, miles and miles away from competitors like TI Nspire CX CAS and Casio ClassPad II fx-CP 400 ... Even HP 49/50g have third party developed routines with limited Variable Precision floating point support, while such feature is Not fully integrated to their native CAS Kernel. For those who do not see "plenty" reason for a [ Longfloat ] Porting to HP Prime its needless to say that the PRIMARY reason for ANY [ CALCULATOR ] is to CALCULATE ! and besides Symbolic Computation ( already implemented on all contemporaries top calculator models ), Arbitrary / [ Variable Precision ] Floating Point Arithmetic is simply The TOP of the TOP ( of the IceCream ) in [ Numerical ] Computation ! ( and beside Computer Algebra Manipulation routines, one of the Main reasons for the initial development of the major packages like Maple, Mathematica or Maxima ).

    Thanks for the Link to [ HPMuseum.org ] Page with Valuable Details about the Internal Floating Point implementations both on Home and CAS environments of HP Prime. Its interesting to point to the fact that HP 49/50g has a [ Longfloat ] Version 3.93 package implementation ( with the Same Name but Distinct Code from the Giac Library ) available at [ http://www.hpcalc.org/details.php?id=5363 ] Also its worth mentioning [ Wikipedia ] pages on Arbitrary Precision Arithmetic like [ https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic ], [ https://en.wikipedia.org/wiki/List_of_arbitrary-precision_arithmetic_software ] and [ https://en.wikipedia.org/wiki/List_of_computer_algebra_systems ] and the Xcas/Giac project at [ https://en.wikipedia.org/wiki/Xcas#Giac ] and Official Site at [ http://www-fourier.ujf-grenoble.fr/~parisse/giac.html ] It would be a Dream come True when a Fully Integrated Variable Precision Floting Point Arithmetic package where definetively incorporated to HP Prime CAS Kernel, like the Giac [ Longfloat ] Library, allowing the Prime to be the First calculator with such Resource trully incorporated at its [ Kernel ] level ( and not like an optional third party module as the HP 49/50g one, which lacks complete integration with their respective Kernel, since HP 49/50g does not have native support for Longfloats ).

  • Inline functions in C, gcc optimization and floating point arithmetic issues

    For several days I really have become a fan of Alchemy. But after intensive testing I have found several issues which I'd like to solve but I can't without any help.
    So...I'm porting an old game console emulator written by me in ANSI C. The code is working on both gcc and VisualStudio without any modification or crosscompile macros. The only platform code is the audio and video output which is out of scope, because I have ported audio and video witin AS3.
    Here are the issues:
    1. Inline functions - Having only a single inline function makes the code working incorrectly (although not crashing) even if any optimization is enabled or not (-O0 or O3). My current workarround is converting the inline functions to macros which achieves the same effect. Any ideas why inline functions break the code?
    2. Compiler optimizations - well, my project consists of many C files one of which is called flash.c and it contains the main and exported functions. I build the project as follows:
    gcc -c flash.c -O0 -o flash.o     //Please note the -O0 option!!!
    gcc -c file1.c -O3 -o file1.o
    gcc -c file2.c -O3 -o file2.o
    ... and so on
    gcc *.o -swc -O0 -o emu.swc   //Please note the -O0 option again!!!
    mxmlc.exe -library-path+=emu.swc --target-player=10.0.0 Emu.as
    or file in $( ls *.o ) //Removes the obj files
        do
            rm $file
        done
    If I define any option different from -O0 in gcc -c flash.c -O0 -o flash.o the program stops working correctly exactly as in the inline funtions code (but still does not crash or prints any errors in debug). flash has 4 static functions to be exported to AS3 and the main function. Do you know why?
    If I define any option different from -O0 in gcc *.o -swc -O0 -o emu.swc  the program stops working correctly exactly as above, but if I specify -O1, -O2 or O3 the SWC file gets smaller up to 2x for O3. Why? Is there a method to optimize all the obj files except flash.o because I suspect a similar issue as when compilling it?
    3. Flating point issues - this is the worst one. My code is mainly based on integer arithmetic but on 1-2 places it requires flating point arithmetic. One of them is the conversion of 16-bit 44.1 Khz sound buffer to a float buffer with same sample rate but with samples in the range from -1.0 to 1.0.
    My code:
    void audio_prepare_as()
        uint32 i;
        for(i=0;i<audioSamples;i+=2)
            audiobuffer[i] = (float)snd.buffer[i]/32768;
            audiobuffer[i+1] = (float)snd.buffer[i+1]/32768;
    My audio playback is working perfectly. But not if using the above conversion and I have inspected the float numbers - all incorrect and invalid. I tried other code with simple floats - same story. As if alchemy refuses to work with floats. What is wrong? I have another lace whre I must resize the framebuffer and there I have a float involved - same crap. Please help me?
    Found the floating point problem: audiobuffer is written to a ByteArray and then used in AS. But C floats are obviously not the same as those in AS3. Now the floating point is resolved.
    The optimization issues remain! I really need to speed up my code.
    Thank you in advice!

    Dear Bernd,
    I am still unable to run the optimizations and turn on the inline functions. None of the inline functions contain any stdli function just pure asignments, reads, simple arithmetic and bitwise operations.
    In fact, the file containing the main function and those functions for export in AS3 did have memset and memcpy. I tried your suggestion and put the code above the functions calling memset and memcpy. It did not work soe I put the code in a header which is included topmost in each C file. The only system header I use is malloc.h and it is included topmost. In other C file I use pow, sin and log10 from math.h but I removed it and made the same thing:
    //shared.h
    #ifndef _SHARED_H_
    #define _SHARED_H_
    #include <malloc.h>
    static void * custom_memmove( void * destination, const void * source, unsigned int num ) {
      void *result; 
      __asm__("%0 memmove(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num)); 
      return result; 
    static void * custom_memcpy ( void * destination, const void * source, unsigned int num ) { 
      void *result; 
      __asm__("%0 memcpy(%1, %2, %3)\n" : "=r"(result) : "r"(destination), "r"(source), "r"(num)); 
      return result; 
    static void * custom_memset ( void * ptr, int value, unsigned int num ) { 
      void *result; 
      __asm__("%0 memset(%1, %2, %3)\n" : "=r"(result) : "r"(ptr), "r"(value), "r"(num)); 
      return result; 
    static float custom_pow(float x, int y) {
        float result;
      __asm__("%0 pow(%1, %2)\n" : "=r"(result) : "r"(x), "r"(y));
      return result;
    static double custom_sin(double x) {
        double result;
      __asm__("%0 sin(%1)\n" : "=r"(result) : "r"(x));
      return result;
    static double custom_log10(double x) {
        double result;
      __asm__("%0 log10(%1)\n" : "=r"(result) : "r"(x));
      return result;
    #define memmove custom_memmove
    #define memcpy custom_memcpy
    #define memset custom_memset
    #define pow custom_pow
    #define sin custom_sin
    #define log10 custom_log10 
    #include "types.h"
    #include "macros.h"
    #include "m68k.h"
    #include "z80.h"
    #include "genesis.h"
    #include "vdp.h"
    #include "render.h"
    #include "mem68k.h"
    #include "memz80.h"
    #include "membnk.h"
    #include "memvdp.h"
    #include "system.h"
    #include "loadrom.h"
    #include "input.h"
    #include "io.h"
    #include "sound.h"
    #include "fm.h"
    #include "sn76496.h" 
    #endif /* _SHARED_H_ */ 
    It still behave the same way as if nothing was changed (works incorrectly - displays jerk which does not move, whereby the image is supposed to move)
    As I am porting an emulator (Sega Mega Drive) I use manu arrays of function pointers for implementing the opcodes of the CPU's. Could this be an issue?
    I did a workaround for the floating point problem but processing is very slow so I hear only bzzt bzzt but this is for now out of scope. The emulator compiled with gcc runs at 300 fps on a 1.3 GHz machine, whereby my non optimized AVM2 code compiled by alchemy produces 14 fps. The pure rendering is super fast and the problem lies in the computational power of AVM. The frame buffer and the enulation are generated in the C code and only the pixels are copied to AS3, where they are plotted in a BitmapData. On 2.0 GHz Dual core I achieved only 21 fps. Goal is 60 fps to have smooth audio and video. But this is offtopic. After all everything works (slow) without optimization, and I would somehow turn it on. Suggestions?
    Here is the file with the main function:
    #include "shared.h"
    #include "AS3.h"
    #define FRAMEBUFFER_LENGTH    (320*240*4)
    static uint8* framebuffer;
    static uint32  audioSamples;
    AS3_Val sega_rom(void* self, AS3_Val args)
        int size, offset, i;
        uint8 hardware;
        uint8 country;
        uint8 header[0x200];
        uint8 *ptr;
        AS3_Val length;
        AS3_Val ba;
        AS3_ArrayValue(args, "AS3ValType", &ba);
        country = 0;
        offset = 0;
        length = AS3_GetS(ba, "length");
        size = AS3_IntValue(length);
        ptr = (uint8*)malloc(size);
        AS3_SetS(ba, "position", AS3_Int(0));
        AS3_ByteArray_readBytes(ptr, ba, size);
        //FILE* f = fopen("boris_dump.bin", "wb");
        //fwrite(ptr, size, 1, f);
        //fclose(f);
        if((size / 512) & 1)
            size -= 512;
            offset += 512;
            memcpy(header, ptr, 512);
            for(i = 0; i < (size / 0x4000); i += 1)
                deinterleave_block(ptr + offset + (i * 0x4000));
        memset(cart_rom, 0, 0x400000);
        if(size > 0x400000) size = 0x400000;
        memcpy(cart_rom, ptr + offset, size);
        /* Free allocated file data */
        free(ptr);
        hardware = 0;
        for (i = 0x1f0; i < 0x1ff; i++)
            switch (cart_rom[i]) {
         case 'U':
             hardware |= 4;
             break;
         case 'J':
             hardware |= 1;
             break;
         case 'E':
             hardware |= 8;
             break;
        if (cart_rom[0x1f0] >= '1' && cart_rom[0x1f0] <= '9') {
            hardware = cart_rom[0x1f0] - '0';
        } else if (cart_rom[0x1f0] >= 'A' && cart_rom[0x1f0] <= 'F') {
            hardware = cart_rom[0x1f0] - 'A' + 10;
        if (country) hardware=country; //simple autodetect override
        //From PicoDrive
        if (hardware&8)        
            hw=0xc0; vdp_pal=1;
        } // Europe
        else if (hardware&4)    
            hw=0x80; vdp_pal=0;
        } // USA
        else if (hardware&2)    
            hw=0x40; vdp_pal=1;
        } // Japan PAL
        else if (hardware&1)      
            hw=0x00; vdp_pal=0;
        } // Japan NTSC
        else
            hw=0x80; // USA
        if (vdp_pal) {
            vdp_rate = 50;
            lines_per_frame = 312;
        } else {
            vdp_rate = 60;
            lines_per_frame = 262;
        /*SRAM*/   
        if(cart_rom[0x1b1] == 'A' && cart_rom[0x1b0] == 'R')
            save_start = cart_rom[0x1b4] << 24 | cart_rom[0x1b5] << 16 |
                cart_rom[0x1b6] << 8  | cart_rom[0x1b7] << 0;
            save_len = cart_rom[0x1b8] << 24 | cart_rom[0x1b9] << 16 |
                cart_rom[0x1ba] << 8  | cart_rom[0x1bb] << 0;
            // Make sure start is even, end is odd, for alignment
            // A ROM that I came across had the start and end bytes of
            // the save ram the same and wouldn't work.  Fix this as seen
            // fit, I know it could probably use some work. [PKH]
            if(save_start != save_len)
                if(save_start & 1) --save_start;
                if(!(save_len & 1)) ++save_len;
                save_len -= (save_start - 1);
                saveram = (unsigned char*)malloc(save_len);
                // If save RAM does not overlap main ROM, set it active by default since
                // a few games can't manage to properly switch it on/off.
                if(save_start >= (unsigned)size)
                    save_active = 1;
            else
                save_start = save_len = 0;
                saveram = NULL;
        else
            save_start = save_len = 0;
            saveram = NULL;
        return AS3_Int(0);
    AS3_Val sega_init(void* self, AS3_Val args)
        system_init();
        audioSamples = (44100 / vdp_rate)*2;
        framebuffer = (uint8*)malloc(FRAMEBUFFER_LENGTH);
        return AS3_Int(vdp_rate);
    AS3_Val sega_reset(void* self, AS3_Val args)
        system_reset();
        return AS3_Int(0);
    AS3_Val sega_frame(void* self, AS3_Val args)
        uint32 width;
        uint32 height;
        uint32 x, y;
        uint32 di, si, r;
        uint16 p;
        AS3_Val fb_ba;
        AS3_ArrayValue(args, "AS3ValType", &fb_ba);
        system_frame(0);
        AS3_SetS(fb_ba, "position", AS3_Int(0));
        width = (reg[12] & 1) ? 320 : 256;
        height = (reg[1] & 8) ? 240 : 224;
        for(y=0;y<240;y++)
            for(x=0;x<320;x++)
                di = 1280*y + x<<2;
                si = (y << 10) + ((x + bitmap.viewport.x) << 1);
                p = *((uint16*)(bitmap.data + si));
                framebuffer[di + 3] = (uint8)((p & 0x1f) << 3);
                framebuffer[di + 2] = (uint8)(((p >> 5) & 0x1f) << 3);
                framebuffer[di + 1] = (uint8)(((p >> 10) & 0x1f) << 3);
        AS3_ByteArray_writeBytes(fb_ba, framebuffer, FRAMEBUFFER_LENGTH);
        AS3_SetS(fb_ba, "position", AS3_Int(0));
        r = (width << 16) | height;
        return AS3_Int(r);
    AS3_Val sega_audio(void* self, AS3_Val args)
        AS3_Val ab_ba;
        AS3_ArrayValue(args, "AS3ValType", &ab_ba);
        AS3_SetS(ab_ba, "position", AS3_Int(0));
        AS3_ByteArray_writeBytes(ab_ba, snd.buffer, audioSamples*sizeof(int16));
        AS3_SetS(ab_ba, "position", AS3_Int(0));
        return AS3_Int(0);
    int main()
        AS3_Val romMethod = AS3_Function(NULL, sega_rom);
        AS3_Val initMethod = AS3_Function(NULL, sega_init);
        AS3_Val resetMethod = AS3_Function(NULL, sega_reset);
        AS3_Val frameMethod = AS3_Function(NULL, sega_frame);
        AS3_Val audioMethod = AS3_Function(NULL, sega_audio);
        // construct an object that holds references to the functions
        AS3_Val result = AS3_Object("sega_rom: AS3ValType, sega_init: AS3ValType, sega_reset: AS3ValType, sega_frame: AS3ValType, sega_audio: AS3ValType",
            romMethod, initMethod, resetMethod, frameMethod, audioMethod);
        // Release
        AS3_Release(romMethod);
        AS3_Release(initMethod);
        AS3_Release(resetMethod);
        AS3_Release(frameMethod);
        AS3_Release(audioMethod);
        // notify that we initialized -- THIS DOES NOT RETURN!
        AS3_LibInit(result);
        // should never get here!
        return 0;

  • [Solved] Bash and Floating point arithmetic

    I didn't realize how troublesome floating point numbers can be until now.
    What I want to do should be simple I dare say:
    properRounding( ( currentTime - downloadTime ) / ( dueTime - downloadTime ) * 100 )
    however best I've been able to achieve so far is this:
    echo "($currentTime-$downloadTime)/($dueTime-$downloadTime)*100" | bc -l
    Which prints the correct floating point value.
    I've tried to put the result in a variable, but I must be doing it wrong as I get the most peculiar error. I can live without it, but it would make life easier.
    As for the rounding, that is a must. I've read that if you remove the -l param from bc, then it will round, but in my case something goes wrong as I just get the value 0 in return and besides, concluded after a simple test, bc always rounds down as in integer division, which I can not use.
    So of course I'll continue reading and hopefully someday arrive at a solution, but I would very much appreciate if someone could lend me a hand.
    This after all is not just a learning experience, I'm trying to create something useful.
    Best regards.
    edit:
    nb:
    all variables are integers.
    Last edited by zacariaz (2012-09-09 14:50:18)

    Just for the fun of it, here's my progress thus far... Well, there really isn't much more to do. The rest is a conky thing.
    #!/bin/bash
    # Variables from unitinfo.txt - date as unix timestamps.
    dueTime="$(date +%s -d "$(grep 'Due time: ' ~/unitinfo.txt | cut -c11-)")"
    if [ "$1" = "end" ]
    then echo $dueTime
    fi
    downloadTime="$(date +%s -d "$(grep 'Download time: ' ~/unitinfo.txt | cut -c16-)")"
    if [ "$1" = "start" ]
    then echo $downloadTime
    fi
    progress="$(grep 'Progress: ' ~/unitinfo.txt | cut -c11-12 | sed 's/ *$//')"
    if [ "$1" = "prog1" ]
    then echo $progress
    fi
    # The rest
    #progress valued 0-1 for use with conky proress bars
    progress2=$( echo 2k $progress 100 / f | dc )
    if [ "$1" = "prog2" ]
    then echo $progress2
    fi
    # Current time - unix timestamp.
    currentTime="$(date +%s)"
    # Remaining time - unix timestamp
    remainingTime=$(( dueTime-$currentTime ))
    if [ "$1" = "remain" ]
    then echo $remainingTime
    fi
    # Elapsed time - unix timestamp
    elapsedTime=$(( currentTime-downloadTime ))
    if [ "$1" = "elap1" ]
    then echo $elapsedTime
    fi
    # Total amount of time available - unix timestamp
    totalTime=$(( dueTime-downloadTime ))
    if [ "$1" = "total" ]
    then echo $totalTime
    fi
    # How much time has elapsed in percent
    progress3="$(echo 3k $elapsedTime $totalTime / 100 \* 0k 0.5 + 1 / f | dc)"
    if [ "$1" = "elap2" ]
    then echo $progress3
    fi
    # Like the above bur 0-1
    progress4=$( echo 2k $progress3 100 / f | dc )
    if [ "$1" = "elap3" ]
    then echo $progress4
    fi
    # In percent, expected completion vs $dueTime - less than 100 is better.
    expectedCompletion="$( echo 3k $elapsedTime 10000 \* $progress / $totalTime / 0k 0.5 + 1 / f | dc )"
    if [ "$1" = "exp1" ]
    then echo $expectedCompletion
    fi
    # Same as above, but unix timestamp
    expectedCompletion2=$(( downloadTime+(expectedCompletion*totalTime/100) ))
    if [ "$1" = "exp2" ]
    then echo $expectedCompletion2
    fi
    #efficiency
    if [ "$1" = "ef1" ]; then
    if [ $progress -lt $progress3 ]
    then echo "you're behind schedule."
    elif [ $progress -eq $progress3 ]
    then echo "You're right on schedule."
    else
    echo "You're ahead of schedule."
    fi
    fi
    if [ "$1" = "ef2" ]; then
    if [ $expectedCompletion -gt 100 ]
    then echo "You're not going to make it!"
    else
    echo "You're going to make it!"
    fi
    fi

  • Error on floating point?

    One can expect that 1.2 * 3.0 equals 3.60
    But the following statement has the result: 3.5999999999999996
    - System.out.println(1.2 * 3.0);
    Why?
    How can I control or estimate the floating point error?
    Thanks in advance!

    It is not a Java problem or a Java error. It is inherent to floating-point arithmetic.
    1.2 can not be exactly represented in binary floating-point arithmetic. But 1.25 (that is 5 * (2 ^ -2)) can be.
    If your problem requires exact decimal arithmetic, use BigDecimal instead. (It is very slow compared to the conventional floating-point arithmetic).
    Please consult a textbook on numerical calculus for the techniques of dealing with floating-point error - it depends on the algorithm that you use for solving your problem.

  • Dtrace Floating Point gives error on x86

    When I try to create a floating point constant in dtrace x86:
    BEGIN
    printf ("%f", 1.0);
    exit (1);
    I get the error:
    dtrace: failed to compile script special.d: line 3: floating-point constants are not permitted
    Am I using the floating point constant incorrectly, or are floating point constants not permitted in the x86 platform.
    Thanks,
    Chip

    Then what is meant at the bottom of page 48 of the
    Solaris Dynamic Tracing Guide where it talks about
    floating-point constants?
    ChipSorry for not making that sufficiently clear. We are reserving that syntax for possible future use, but you cannot specify floating-point constants at present, and you cannot perform floating-point arithmetic in D. The only legal use of floating-point is that you can trace one or more data objects or structures that contain floating-point values and format the results using printf() and the various %f, %g formats.
    -Mike

  • Floating point Number & Packed Number

    Hai can anyone tell me what is the difference in using floating point & packed Number .
    when it will b used ?

    <b>Packed numbers</b> - type P
    Type P data allows digits after the decimal point. The number of decimal places is generic, and is determined in the program. The value range of type P data depends on its size and the number of digits after the decimal point. The valid size can be any value from 1 to 16 bytes. Two decimal digits are packed into one byte, while the last byte contains one digit and the sign. Up to 14 digits are allowed after the decimal point. The initial value is zero. When working with type P data, it is a good idea to set the program attribute Fixed point arithmetic.Otherwise, type P numbers are treated as integers.
    You can use type P data for such values as distances, weights, amounts of money, and so on.
    <b>Floating point numbers</b> - type F
    The value range of type F numbers is 1x10*-307 to 1x10*308 for positive and negative numbers, including 0 (zero). The accuracy range is approximately 15 decimals, depending on the floating point arithmetic of the hardware platform. Since type F data is internally converted to a binary system, rounding errors can occur. Although the ABAP processor tries to minimize these effects, you should not use type F data if high accuracy is required. Instead, use type P data.
    You use type F fields when you need to cope with very large value ranges and rounding errors are not critical.
    Using I and F fields for calculations is quicker than using P fields. Arithmetic operations using I and F fields are very similar to the actual machine code operations, while P fields require more support from the software. Nevertheless, you have to use type P data to meet accuracy or value range requirements.
    reward if useful

  • T1000 Floating Point Arithemetic

    I read somewhere that the T1000 does not scale java apps very well with muliple threads if those threads do a lot of floating point arithmetic. Can some one explain and provide a simple situation in java that would manifest this problem.
    I am testing a multithreaded java app on the T1000 and it just does not scale. I need to try and identify why. The java app itself does not do any floating point arithmetic but maybe the library software it uses does.

    Short version is that the T1000 only has one FPU to share amongst the 8 cores (and 32 threads). So, while the integer ops act like it's on a 8-32 core box, the fpu apps act like it's a single core (because it is, in that case)...
    I've read some docs on the web that suggest the T1000 doesn't scale Java well in the default configuration. They had tweaks that needed to be done to the OS and resulted in massive speed increases. Unfortunately, I can't remember any details and don't have the links available :(

  • Can I implement advanced control algorithm with floating-point computations in Ni 7831R ?

    Hi,
    I plan to use a Ni 7831R to control a MIMO nano-positioning stage with servo rate above 20kHz. The control algorithm is based on robust control design and it is much more complex than PID. it also includes floating-point caculations. Can I implement such algorithm with Ni 7831R?
    By the way, is there any way to expand the FPGA gates number for Ni 7831R? Suppose I run out of the FPGA gates (1M), can I add more FPGA gates by buying some different hardware?
    Thanks
    Jingyan
    Message Edited by Jingyan on 08-22-2006 01:45 PM

    Jingyan,
    as long as there is no GPU core implemented on the FPGA these devices only support integer arithmetic. NI's FPGA targets currently don't contain a GPU core so there is no native floating point arithmetic available.
    Still there are several options to implement floating point arithmetic on your own or to work around this target specific limitation. Here are some links that might help:
    Floating-Point Addition in LabVIEW FPGA
    Multiplying, Dividing and Scaling in LabVIEW FPGA
    The NI 7831R uses an 1M FPGA. If your application requires more gates the NI 7833R (3M) is a good solution.
    I hope that helps,
    Jochen Klier
    National Instruments Germany

  • 128-bit floating point numbers on new AMD quad-core Barcelona?

    There's quite a lot of buzz over at Slashdot about the new AMD quad core chips, announced yesterday:
    http://hardware.slashdot.org/article.pl?sid=07/02/10/0554208
    Much of the excitement is over the "new vector math unit referred to as SSE128", which is integrated into each [?!?] core; Tom Yager, of Infoworld, talks about it here:
    Quad-core Opteron? Nope. Barcelona is the completely redesigned x86, and it’s brilliant
    Now here's my question - does anyone know what the inputs and the outputs of this coprocessor look like? Can it perform arithmetic [or, God forbid, trigonometric] operations [in hardware] on 128-bit quad precision floats? And, if so, will LabVIEW be adding support for it? [Compare here versus here.]
    I found a little bit of marketing-speak blather at AMD about "SSE 128" in this old PDF Powerpoint-ish presentation, from June of 2006:
    http://www.amd.com/us-en/assets/content_type/DownloadableAssets/PhilHesterAMDAnalystDayV2.pdf
    WARNING: PDF DOCUMENT
    Page 13: "Dual 128-bit SSE dataflow, Dual 128-bit loads per cycle"
    Page 14: "128-bit SSE and 128-bit Loads, 128b FADD, 128 bit FMUL, 128b SSE, 128b SSE"
    etc etc etc
    While it's largely just gibberish to me, "FADD" looks like what might be a "floating point adder", and "FMUL" could be a "floating point multiplier", and God forbid that the two "SSE" units might be capable of computing some 128-bit cosines. But I don't know whether that old paper is even applicable to the chip that was released yesterday, and I'm just guessing as to what these things might mean anyway.
    Other than that, though, AMD's main website is strangely quiet about the Barcelona announcement. [Memo to AMD marketing - if you've just released the greatest thing since sliced bread, then you need to publicize the fact that you've just released the greatest thing since sliced bread...]

    I posted a query over at the AMD forums, and here's what I was told.
    I had hoped that e.g. "128b FADD" would be able to do something like the following:
    /* "quad" is a hypothetical 128-bit quad precision  */
    /* floating point number, similar to "long double"  */
    /* in recent versions of C++:                       */
    quad x, y, z;
    x = 1.000000000000000000000000000001;
    y = 1.000000000000000000000000000001;
    /* the hope was that "128b FADD" could perform the  */
    /* following 128-bit addition in hardware:          */
    z = x + y;
    However, the answer I'm getting is that "128b FADD" is just a set of two 64-bit adders running in parallel, which are capable of adding two vectors of 64-bit doubles more or less simultaneously:
    double x[2], y[2], z[2];
    x[0] = 1.000000000000000000000000000001;
    y[0] = 1.000000000000000000000000000001;
    x[1] = 2.000000000000000000000000000222;
    y[1] = 2.000000000000000000000000000222;
    /* Apparently the coordinates of the two "vectors" x & y       */
    /* can be sent to "128b FADD" in parallel, and the following   */
    /* two summations can be computed more or less simultaneously: */
    z[0] = x[0] + y[0];
    z[1] = x[1] + y[1];
    Thus e.g. "128b FADD", working in concert with "128b FMUL", will be able to [more or less] halve the amount of time it takes to compute a dot product of vectors whose coordinates are 64-bit doubles.
    So this "128-bit" circuitry is great if you're doing lots of linear algebra with 64-bit doubles, but it doesn't appear to offer anything in the way of greater precision for people who are interested in precision-sensitive calculations.
    By the way, if you're at all interested in questions of precision sensitivity & round-off error, I'd highly recommend Prof Kahan's page at Cal-Berzerkeley:
    http://www.cs.berkeley.edu/~wkahan/
    PDF DOCUMENT: How JAVA's Floating-Point Hurts Everyone Everywhere
    http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
    PDF DOCUMENT: Matlab's Loss is Nobody's Gain
    http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf

  • Floating point multiplication

    hello everybody!
    I use OpenSPARC T1. In floating point multiplication the upper 64 bit (64 to 128) where they compute and stored? ...in the fpu or it uses the SPU unit?
    thanx in advance

    Hi,
    According with the OpenSparc T1 micro-architecture specifications (pag 204):
    The FPU includes three independent execution pipelines:
    Floating-point adder (FPA) – adds, subtracts, compares, conversions
    Floating-point multiplier (FPM) – multiplies
    Floating-point divider (FPD) – divides
    However, keep in mind that all the registers for the floating point operations are kept in the cores.
    This is what the specs (pag 31) say about the SPU: "Stream processing unit (SPU) is used for modular arithmetic functions for crypto."

  • Profibus data type converting to floating point

    Hi, 
    Is there an efficient way to convert  the incoming data to the floating point in the NI cRIO Profibus system? 
    1) The system is : NI cRIO 9068 Controller with Comsoft profibus slave module. The lavview we are using is 2013. 
    2) We are using profibus slave example and are able to see a array of unsigned 8 bit data. Please see the Output data shown in the attachement.
    3) On the other side of profibus system, thrid party profibus master are converting floating points and tranmit the converted data to NI profibus slave. 
    Attachments:
    Screenshoot.png ‏7 KB

    If it's just a matter of converting data types once you have the data in LabVIEW, you can always manually scale and convert the data using the arithmetic functions and the "To Double Precision Float" or "To Single Precision Float" functions. You just have to know what floating point value the unsigned byte integer corresponds to. Is that what you're asking?
    If you're asking for a way to this inherently with the Profibus functions, I'm afraid I can't be of much help...
    Ryan K.

  • Question in floating point operation

    Hi,
    I have question in java floating point operation.
    public class test
         public static void main(String args[])
              double d1 = 243.35 ;
              double d2 = 2.3 ;
              System.out.println(d1 * d2) ;
              System.out.println((float)d1 * (float)d2) ;
    The result is,
    java version "1.4.1_02"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06)
    Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)
    5.597049999999999E8
    5.5970502E8
    Though the multiplication does not result irrational number like 1/3, the result of the first statement is not accurate enough. In our project, this multiplication involves with money and we cannot ignore this.
    Can anyone suggest why this is happening? Do I need to convert all the numbers to float to avoid this...Or Is it a bug?
    ~ Sathiya Dhanapal.

    The underlying problem is that not all numbers can be represented exactly in a floating point representation. But if you perform all calculations using doubles and then round to two fractional digits at the end you should get a "correct" result UNLESS you have used ill-conditioned formulas introducing other kinds of arithmetic errors.
    There's another way around this when it comes to counting money and that's to use integers (long or int). You convert every number to the lowest monetary unit (like a cent or whatever). Every money-amount can now be represented exactly but you still have to be careful because the rounding problem is still there (What do you do with the last cent when you split 100 cents in 3).
    In your example the "more correct" you've got from using floats instead of doubles is only an illusion. The result has been implictly rounded becasuse fewer bits have been used. If you round the double result to the same precision as the float result, they're the same.
    The important lesson in all this is TO KNOW WHEN TO ROUND.

  • Division without Fixed Point Arithmetic

    Hello,
         I am trying to divide 1 by a decimal, i.e.  1 / 7.75027, in Include RV63A900 of Function Group V61A.  Since the ' Fixed Point Arithmetic' checkbox is not checked, I am unable to get this calculation to work. 
         I have tried different scenarios of using a floating point field to receive the quotient but nothing has worked.
    Any suggestions? 
    Thank you,
    Chris Mowl

    I use floating variable some thing like this :
    data: v_f TYPE f,
             v_c TYPE char12.
          v_f = ( wa_komp_o-netwr / o_komp-mglme ).
    * Due to decimal issues had to use Floating point value
    * for calculation of condition base value
          CALL FUNCTION 'FLTP_CHAR_CONVERSION'
            EXPORTING
              decim = 5
              input = v_f
            IMPORTING
              flstr = v_c.
          v_amount = v_c * 10.

Maybe you are looking for

  • Help with purchasing ipod

    I plan on buying an iPod soon. My girlfriend has one and the only thing about it that bugs me is that when playing an album, if the tracks are meant to run together, it will skip between tracks. I looked through the options on the iPod and iTunes but

  • No sound from system since installing latest iTunes

    Since installing the latest iTunes off the internet, not only do I hear no music, but I have no system sounds from the computer or any other programs. I ran through all the checks to make sure my card and drivers were configured properly and everythi

  • Should I keep all my photo's in one folder altogether using keywords

    Lot's of folders or just the one? Keywords doesn't work across the board if you use lot's of folders so I have been putting all into one folder tell me this is not wrong or do I have to separate all again.....Ahhhh

  • Odd contact suddenly appear in my list

    I am the technical representative for a barge manufacture in Pa.  All of the sudden over the last couple of days odd contacts have been showing up in our contact lists.  We are on the AT&T network and we use Microsoft Exchange for our email.  I belie

  • Canon printer problem after upgrade.

    Upgraded iMac to Yosemite 10.10.2.  Now my  Canon printer LPB6000-LPB6018 does not respond to print request.  Even tho the printer is fairly new, I have forgotten what the procedure was in setting it up with Maverick that came with the Mac.  Can it b