Floating Point Representations on SPARC (64-bit architecture)

Hi Reader,
I got hold of "Numerical Computation Guide -2005" by Sun while looking for Floating Point representations on 64 bit Architectures. It gives me nice illustrations of Single and Double formats and the solution for endianness with
two 32-bit words. But it doesn't tell me how it is for 64-bit SPARC or 64-bit x86.
I might be wrong here, but having all integers and pointers of 64-bit length, do we still need to break the floating point numbers and store them in lower / higher order addresses ??
or is it as simple as having a Double Format consistent in the bit-pattern across all the architectures (Intel, SPARC, IBMpowerPC, AMD) with 1 + 11 + 52 bit pattern.
I have tried hard to get hold of a documentation that explains a 64-bit architecture representation of a Floating Point Number. Any suggestion should be very helpful.
Thanks for reading. Hope you have something useful to write back.
Regards,
Regmee

The representation of floating-point numbers is specified by IEEE standard 754. This standard contains the specifications for single-precision (32-bit), and double-precision (64-bit) floating-point numbers (There is also a quad-precision (128-bit) format as well). OpenSPARC T1 supports both single and double precision numbers, and can support quad-precision numbers through emulation (not in hardware). The fact that this is a 64-bit machine does not affect how the numbers are stored in memory.
The only thing that affects how the numbers are stored in memory is endianness. SPARC architecture is big-endian, while x86 is little-endian. But a double-precision floating-point numer in a SPARC register looks the same as a double-precision floating-point number in an x86 register.
formalGuy

Similar Messages

Is it possible to have a floating point representation in an array in LabVIEW?

Could anyone be able to tell me how I would go about creating a part of a Vi that would create a user defined N random arrays of user defined length consisting of floating point representation?
The values in the array cannot be limited to 0 and 1.
Thanks
Alan Homer

Sorry, I don't understand what you want,
N random arrays? You can build an array of required length, bundle it to a cluster and repeat this X times (Depending on how many arrays you need). Each cluster can then in turn be grouped to an array, whereby each array retains its own length.
You can easily define an array as an array of floating point numbers. Is this what you mean? (See picture)
I also don't quite understand what the values 0 and 1 have to do with your question as a whole.....
Trying to help
Shane
Using LV 6.1 and 8.2.1 on W2k (SP4) and WXP (SP2)
Attachments:
N random arrays.png ‏5 KB

Floating point multiplication

hello everybody!
I use OpenSPARC T1. In floating point multiplication the upper 64 bit (64 to 128) where they compute and stored? ...in the fpu or it uses the SPU unit?
thanx in advance

Hi,
According with the OpenSparc T1 micro-architecture specifications (pag 204):
The FPU includes three independent execution pipelines:
Floating-point adder (FPA) adds, subtracts, compares, conversions
Floating-point multiplier (FPM) multiplies
Floating-point divider (FPD) divides
However, keep in mind that all the registers for the floating point operations are kept in the cores.
This is what the specs (pag 31) say about the SPU: "Stream processing unit (SPU) is used for modular arithmetic functions for crypto."

Question in floating point operation

Hi,
I have question in java floating point operation.
public class test
     public static void main(String args[])
          double d1 = 243.35 ;
          double d2 = 2.3 ;
          System.out.println(d1 * d2) ;
          System.out.println((float)d1 * (float)d2) ;
The result is,
java version "1.4.1_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06)
Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)
5.597049999999999E8
5.5970502E8
Though the multiplication does not result irrational number like 1/3, the result of the first statement is not accurate enough. In our project, this multiplication involves with money and we cannot ignore this.
Can anyone suggest why this is happening? Do I need to convert all the numbers to float to avoid this...Or Is it a bug?
~ Sathiya Dhanapal.

The underlying problem is that not all numbers can be represented exactly in a floating point representation. But if you perform all calculations using doubles and then round to two fractional digits at the end you should get a "correct" result UNLESS you have used ill-conditioned formulas introducing other kinds of arithmetic errors.
There's another way around this when it comes to counting money and that's to use integers (long or int). You convert every number to the lowest monetary unit (like a cent or whatever). Every money-amount can now be represented exactly but you still have to be careful because the rounding problem is still there (What do you do with the last cent when you split 100 cents in 3).
In your example the "more correct" you've got from using floats instead of doubles is only an illusion. The result has been implictly rounded becasuse fewer bits have been used. If you round the double result to the same precision as the float result, they're the same.
The important lesson in all this is TO KNOW WHEN TO ROUND.

Convert Floating Point Decimal to Hex

In my application I make some calculations using floating point format DBL,and need to write these values to a file in IEEE 754 Floating Point Hex format. Is there any way to do this using LabVIEW?

Mike,
Good news. LabVIEW has a function that does exactly what you want. It is well hidden though...
In the Advanced/Data manipulation palette there is a function called Flatten to String. If you feed this funtion with your DBL precision digital value you get the IEEE-754 hexadecimal floating point representation (64 bit) at the data string terminal (as a text string).
I attached a simple example that shows how it works.
Hope this helps. /Mikael Garcia
Attachments:
ieee754converter.vi ‏10 KB

Converting Floating Point

I am receiving numbers from a Tinius Olsen Model 290 Universal Testing System. I made a little program to send commands to the machine and receive the responses back as strings. For the decimal value 3.7620 I get the floating point number EDC57040. This is obviously not correct. The correct floating point representation is 4070C5ED, which is accomplished by swapping all the bits. I need to know how to do this in labview, basically go from the string representation of the floating point to the correct swapped floating point.

Here is a simple example:
Be aware that if you have an ancient version of LabVIEW, this option (byte order) is not available and you need to swap the bytes manually. No big deal, really.
Message Edited by altenbach on 05-08-2007 01:15 PM
LabVIEW Champion . Do more with less code and in less time .
Attachments:
LittleEndianSGL.png ‏7 KB

Port of Giac [Longfloat] Library to HP Prime allowing [Variable Precision] Floating Point Arithmetic

HP Prime CAS is based on Giac, but [ misses ] some of its Special Purpose Libraries like the Giac [ Longfloat ] Library, which if [ Ported ] would allow HP Prime to be the First ( handheld ) Calculator to provide [ Variable Precision ] Floating Point Arithmetic routines ( fully integrated at its CAS Kernel level ). HP Prime already have internal calls to [ Longfloat ] library, but resulting in [ Error Messages ], like when selecting more than 14 Digits in [ evalf ] Numerical evaluation, as for example: evalf( 1/7, 14 ) producing 0.142857142857 and evalf( 1/7, 15 ) resulting in "Longfloat library not available Error: Bad Argument Value" The same happens when one tries to Extend the [ Digits ] variable to a value greater than 13, like Digits := 50 which returns Digits := 13 as output ( from any specified value higher than 13 ). The porting of [ Longfloat ] library to HP Prime, would open many New opportunities in [ handheld ] Numerical Computation, usually available only on Top Level Computer Algebra Systems, like Maple, Mathematica or Maxima, and also on Giac/XCas. Its worth mentioning that Any [ Smartphone ] with Xcas/Giac App installed, can fully explore [ Variable Precision ] Floating Point Arithmetic, on current ARM based architectures, which means that a Port of [ Longfloat ] Library from Giac to HP Prime, although requiring some considerable amount of labor, is Not an impossible task. The Benefits of such Longfloat [ Porting ] to a handheld Calculator like HP Prime, would put it several levels Up on the list of Top current Calculator Features, miles and miles away from competitors like TI Nspire CX CAS and Casio ClassPad II fx-CP 400 ... Even HP 49/50g have third party developed routines with limited Variable Precision floating point support, while such feature is Not fully integrated to their native CAS Kernel. For those who do not see "plenty" reason for a [ Longfloat ] Porting to HP Prime its needless to say that the PRIMARY reason for ANY [ CALCULATOR ] is to CALCULATE ! and besides Symbolic Computation ( already implemented on all contemporaries top calculator models ), Arbitrary / [ Variable Precision ] Floating Point Arithmetic is simply The TOP of the TOP ( of the IceCream ) in [ Numerical ] Computation ! ( and beside Computer Algebra Manipulation routines, one of the Main reasons for the initial development of the major packages like Maple, Mathematica or Maxima ).

Thanks for the Link to [ HPMuseum.org ] Page with Valuable Details about the Internal Floating Point implementations both on Home and CAS environments of HP Prime. Its interesting to point to the fact that HP 49/50g has a [ Longfloat ] Version 3.93 package implementation ( with the Same Name but Distinct Code from the Giac Library ) available at [ http://www.hpcalc.org/details.php?id=5363 ] Also its worth mentioning [ Wikipedia ] pages on Arbitrary Precision Arithmetic like [ https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic ], [ https://en.wikipedia.org/wiki/List_of_arbitrary-precision_arithmetic_software ] and [ https://en.wikipedia.org/wiki/List_of_computer_algebra_systems ] and the Xcas/Giac project at [ https://en.wikipedia.org/wiki/Xcas#Giac ] and Official Site at [ http://www-fourier.ujf-grenoble.fr/~parisse/giac.html ] It would be a Dream come True when a Fully Integrated Variable Precision Floting Point Arithmetic package where definetively incorporated to HP Prime CAS Kernel, like the Giac [ Longfloat ] Library, allowing the Prime to be the First calculator with such Resource trully incorporated at its [ Kernel ] level ( and not like an optional third party module as the HP 49/50g one, which lacks complete integration with their respective Kernel, since HP 49/50g does not have native support for Longfloats ).

128-bit floating point calculations

I'm looking to buy a used SPARC. Which model is the oldest that provides
C or fortran 128-bit floating point calculations? Does every 64-bit CPU
have 128-bit quad real numbers?
Ron

The AMD 64-bit processor does not support 128-bit
floating numbers. I need a Sparc processor that
does.And yet your question has just what to do with Java -- why did you create a userid and post this question in the Java forums?
A response "Does every person have ten fingers?"
shows me that you don't know much about writting
computer programs that have more than 16 significant
decimal digits.I wouldn't infer that - but now that you've proven to be a whiny little snot, I'm sure everyone is just going to want to help you here.

32 bit Floating Point

Hello,
Running FCP 5.1
Having audio sync issues and was double checking my settings.
Although the sequence presets are at 16 bit, they are showing up in the browser as 32bit Floating Point.
Any thoughts?
I generally capture now at 30 minute increments and actually have always had this issue. FCP 4.5 and 5.1
all settings are where they should be.
although I do notice, obviously when the device is off, the audio output defaults to 'default' not to firewire dv.
thanks
iMac intel Mac OS X (10.4.8)

Some more details please. What hardware device are you sourcing the audio clips from? The likely culprit is your capture settings. What preset are you using? Check Audio/Video Settings-Capture Presets and see if the preset you've selected records audio as 32 bit. It will say in the right column after you've selected your preset.
If it says 32 bit there, click Edit to get the Capture Preset Editor. Under Quicktime Audio Settings, the Format field should give you a selection of sample rates and possibly alternate bit depths. If your only choice is 32 bit, (as it is for me when I capture audio via my RME, 32 bit Integer in my case) then you'd be well served by bringing those files into Peak or Quicktime and saving them as 16 bit Integer files to match your sequence settings.
If you've imported these files into FCP from an audio editor that can create 32 bit floating point audio files, eg Kyma, Sequoia, Nuendo, etc. then the same advice applies. The 32 bit files are much larger than they need to be and may upset the apple cart (he he, pun) when pulled into a sequence with different settings. More cpu overhead for sure.
Let us know what you find.

128-bit floating point numbers on new AMD quad-core Barcelona?

There's quite a lot of buzz over at Slashdot about the new AMD quad core chips, announced yesterday:
http://hardware.slashdot.org/article.pl?sid=07/02/10/0554208
Much of the excitement is over the "new vector math unit referred to as SSE128", which is integrated into each [?!?] core; Tom Yager, of Infoworld, talks about it here:
Quad-core Opteron? Nope. Barcelona is the completely redesigned x86, and it’s brilliant
Now here's my question - does anyone know what the inputs and the outputs of this coprocessor look like? Can it perform arithmetic [or, God forbid, trigonometric] operations [in hardware] on 128-bit quad precision floats? And, if so, will LabVIEW be adding support for it? [Compare here versus here.]
I found a little bit of marketing-speak blather at AMD about "SSE 128" in this old PDF Powerpoint-ish presentation, from June of 2006:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/PhilHesterAMDAnalystDayV2.pdf
WARNING: PDF DOCUMENT
Page 13: "Dual 128-bit SSE dataflow, Dual 128-bit loads per cycle"
Page 14: "128-bit SSE and 128-bit Loads, 128b FADD, 128 bit FMUL, 128b SSE, 128b SSE"
etc etc etc
While it's largely just gibberish to me, "FADD" looks like what might be a "floating point adder", and "FMUL" could be a "floating point multiplier", and God forbid that the two "SSE" units might be capable of computing some 128-bit cosines. But I don't know whether that old paper is even applicable to the chip that was released yesterday, and I'm just guessing as to what these things might mean anyway.
Other than that, though, AMD's main website is strangely quiet about the Barcelona announcement. [Memo to AMD marketing - if you've just released the greatest thing since sliced bread, then you need to publicize the fact that you've just released the greatest thing since sliced bread...]

I posted a query over at the AMD forums, and here's what I was told.
I had hoped that e.g. "128b FADD" would be able to do something like the following:
/* "quad" is a hypothetical 128-bit quad precision */
/* floating point number, similar to "long double" */
/* in recent versions of C++:                       */
quad x, y, z;
x = 1.000000000000000000000000000001;
y = 1.000000000000000000000000000001;
/* the hope was that "128b FADD" could perform the */
/* following 128-bit addition in hardware:          */
z = x + y;
However, the answer I'm getting is that "128b FADD" is just a set of two 64-bit adders running in parallel, which are capable of adding two vectors of 64-bit doubles more or less simultaneously:
double x[2], y[2], z[2];
x[0] = 1.000000000000000000000000000001;
y[0] = 1.000000000000000000000000000001;
x[1] = 2.000000000000000000000000000222;
y[1] = 2.000000000000000000000000000222;
/* Apparently the coordinates of the two "vectors" x & y       */
/* can be sent to "128b FADD" in parallel, and the following   */
/* two summations can be computed more or less simultaneously: */
z[0] = x[0] + y[0];
z[1] = x[1] + y[1];
Thus e.g. "128b FADD", working in concert with "128b FMUL", will be able to [more or less] halve the amount of time it takes to compute a dot product of vectors whose coordinates are 64-bit doubles.
So this "128-bit" circuitry is great if you're doing lots of linear algebra with 64-bit doubles, but it doesn't appear to offer anything in the way of greater precision for people who are interested in precision-sensitive calculations.
By the way, if you're at all interested in questions of precision sensitivity & round-off error, I'd highly recommend Prof Kahan's page at Cal-Berzerkeley:
http://www.cs.berkeley.edu/~wkahan/
PDF DOCUMENT: How JAVA's Floating-Point Hurts Everyone Everywhere
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
PDF DOCUMENT: Matlab's Loss is Nobody's Gain
http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf

Sequence audio format stuck on 32-bit Floating Point

Hi,
The audio format for all my sequences is set to 32-bit Floating Point when I check them in the browser's Aud Format field. When I manually change the format in the sequence settings nothing changes in the browser and choosing a new sequence set up doesn't seem to help. What I don't understand is that there isn't even an option for 32-bit Floating Point in Audio Settings. Anyone else experience this?
I am using v6.0.6 on osx 10.5.8
Thanks,
Tom

16 bits per channel equals 32 bits stereo. Floating point means that you can, but don't have to, have a non-integer value for your audio.

Photoshop CS3 vs CS2 32 bit floating point tiff loading

Hi,
The application I'm developing exports 3 channel 32 bit floating point tiff's.
The exported files could have been loaded into PS CS2, but PS CS3 can't load them anymore, all I get is a message box with this message:
"Could not complete your request because of a problem parsing the TIFF file."
I've uploaded the file here, so you guys can have a look at: www.thepixelmachine.com/dispmap.tiff

Where should I've posted this thread instead? I suppose people here write plugins by programming them (right?), so they suppose to be programmers, not regular PS users. I thought a programmer would be much more familiar with programming related issues, such as tiff file saving c++ routines. Correct me if I'm wrong.
The people who tested the file have Photoshop CS3 and didn't manage to open it, they even sent me screenshots with the error message box. In CS2 however, the file loads perfectly. Also, the single strip version opened just fine in CS3.
You may close this thread, the problem was solved and more than that it looks like I haven't posted it in the right forum.
Thanks.

Help me changing 32-bit floating point into 16bit

When I open a new project, the browser gives the following information:
Audio format: 32-bit floating point.
Even when I changed the sequence settings (under sequence > settings, or under final cut pro > audio/video settings > sequence presets > edit) to 16 bit, the 32-bit floating point in the browser stays the same.
I would love to change this into 16 bit. Can anybody help me?
Thnx

You can't change it. FCP is 32-bit float internally. What are you trying to do by changing it?

Using iMovie footage in FCP: 32-bit floating point audio?

I'd like to use iMovie 7 to capture my DV footage because I like its cataloging and skimming features, and the way it splits up clips based on DV stops. I want to do my editing in Final Cut Pro 5. Unfortunately, I've found that the audio depth in the clips I've captured with iMovie is 32-bit floating point, rather than 16-bit integer, and FCP has to render the audio before it can play back. Strange, since the FCP browser says that the sequence I dropped the iMovie footage into IS 32-bit. Any ideas on how to get FCP to playback this bit depth without rendering, and without having to convert my footage?

Do a search here and you will find all kinds of post on this very subject.
The main problems with using iMove to capture for FCP are
1. iMovie and FCP capture in very different ways. iMovie captures footage as a DV stream, problematic for editing in FCP
2. iMovie captures will not give you the timecode from your tape... may or may not be important to you unless you ever need to recapture and reconnect the footage.
In short, if you are editing in FCP... capture with FCP. If you want to split the footage based on tape start/stop, use the Start Stop detect function after you have captured.
rh

32 bit floating point ... SLOW...

Hi,
I ran a little test because i found Motion took too much time exporting with 32 bit floating point.
I made a single layer, animated text in Motion with my Quad ( 2.5 G RAM)
When i exported 32 bit floating point QT Animation in Motion, it was very very slow and the CPU were running at 10 to 15%.
When i export 8 bit floating point, it is much faster but CPU run at about 20%.
BUT
In FCP, when i render 8 bit Motion project and .mov (from 8 bit), or 32 bit Motion projet and .mov (from 32 bit), they all render pretty fast...
8 bit Motion prj 45% CPU
.mov from Motion 8 bit 30% CPU
32 bit Motion prj 45% CPU
.mov from Motion 32 bit 60% CPU
Why that much difference ?
I dont understand why the CPU are running higher with a .mov (QT Animation) that has been created in 32 bit floating point ?
I thought that once it has been created (self contained) it did not matter...
thanks

32 bit floating point refers to how it will be rendered. Has nothing to do with the format itself. You're OK ... just edit.
32-bit floating point allows audio calculations, such as fader levels and effects processing, to be
performed at very high resolution with a minimum of error, which preserves the quality of your digital audio.
Jerry

Floating Point Representations on SPARC (64-bit architecture)

Similar Messages

Maybe you are looking for