High Throughput Divide Inverted?

I am trying to optimize a large FPGA VI and was doing some simple tests to determine FPGA usage and timing for some high use items. I was comparing the high throughput divide with the normal one. I am using a cRIO-9066. What I found was that the inputs to the high throughput divide, in this case, appear to be reversed.
The equivalent normal divide works as expected, although it does take more resources and time. Has anyone else seen this? If so, is there a workaround?

Forgot to add my versions. I am using a fully patched 2014SP1 stack (LabVIEW, FPGA, RIO).
I have also since tested the high throughput divide in pipelined mode, and it has the same issue there, as well.

Similar Messages

FPGA quick questions: High Throughput Division vs. Multiplication Implementation (rounding?)

Hi all,
I'm trying to implement a simple routine where I divide a FXP by the number 7 in FPGA. I wanted to use the high throughput division but it seems to only round to the nearest integer although the output is capable of representing fractions. Alternatively, I could multiply my number by 1/7 using the high throughput multiplication and I get what I want. I'm not too familiar with FXP arithmetic. Without fully understanding the problem, I at least have a solution which is to use the multiplication. I'd just like to know a little more. Can anyone please shine some insight on why the division rounds even though it can handle fractions?
Thanks for your help
Jeffrey Lee
Solved!
Go to Solution.
Attachments:
highthroughputdivisionormultiply.png ‏31 KB

Thanks for the suggestions. I recreated this and indeed was able to get the correct results! So what happened?
This may blow your minds, but there is something inherently wrong with my x/y indicator. I have it on "adapt to source". I created another supposedly identical indicator ("x/y 2") off the same wire and get the correct result with that indicator. This seems like some kind of bug but it worries me because I should have never run into it.
I've attached a screenshot of the code in action as well as the VI (i'm using 2011)
Thanks
Jeffrey Lee
Attachments:
highthroughputdivisionormultiply_2.png ‏52 KB
highthroughputdivideIssue.vi ‏21 KB

How can i do a sine wave with High Throughput Sine for fpga?

I try to do a sine wave looking the examples but i can't. I have a sbRio 9606.
I need to do a sine wave with high throughput block. In my exercise i have 2 inputs (amplitude and frecuency) and i have to look the output (sinewave)
I need help!!

Hi pablosan,
If I understood correctly, you want to generate a sine wave with a high throughput block in Labview FPGA.
I'm afraid you won't e able to do this, as these blocks are specifically designed for other FPGA targets with better features for high performance. So instead of using the example named "Sine and Cosine.lvproj" under High Throughput examples, you can use the "Sine Wave.lvproj" under Signal Generation and Processing examples, which is more adequate to your Single-Board RIO.
Regards.

Low latency, high throughput RMI, Any suggestions welcome.

I am working on a module which is designed to support low latency synchronous RMI and high throughput asynchronous RMI.
It uses TCP, NIO, custom serialization, custom RMI, Java 5 concurrency thread pools.
Its current latency is 5 to 15 micro-seconds slower than ping times (depending on the arguments/return values). For throughput, it gets 80K-140K calls per second on a single channel.
I have looked at JGroups and Stream.
I was wondering if any one has tried this sort of thing, what suggestions you might have and what open source libraries I might use for compassion.

ejp wrote:
Using custom serialization most messages are under 256 bytes, if I send smaller packets of 128 bytes, it doesn't make much difference.I wouldn't expect it to make any difference. An Ethernet packet is around 1500 bytes. Whatever amount you sent below that is still an Ethernet packet.The maximum packet size is around 1500 bytes, however it won't send more data than needed. If you send ping with different packet sizes you will see different average latencies.
I have used the generic RMI Proxy class, and use simple Reflections to lookup methods to call on the server. I believe I can cut 2-4 microseconds if I generate my own proxies and method callers. This is a bit of work generating byte code for not much saving.You will save time by not using reflection at all.Correct, but for a generic implementation I need to use something.
I was also wondering how much work had been done using non-blocking RMI calls.RMI calls aren't non-blocking. Even if the methods are void they still have to return a a success/failure indication, e.g. 'true' or an exception. You can make them asynchronous by using more threads, but that's more cost.This is a custom implementation, not the built in Java RMI which is much slower (and a pain to use IMHO). When a callback is provided as the first argument, the result is processed in another thread as the results becomes available. In this way it can support many concurrent requests over the same channel. Yes, an additional thread is required and there is a small impact on latency, however asynchronous calls can reduce the impact of latency on an application.
CORBA supports async calls and you may be able to get at that via Java IDL and idlj. Not with RMI/IIOP though.
This is how I get the high throughputs on a single channel.But you're not in control of the number of channels. RMI will use as many channels (i.e. TCP connections) as it needs to execute whatever remote calls are in progress. Making your own calls 'nonblocking' or async doesn't really affect that.I wrote the custom RMI so I make sure I control the number of channels. :) In fact I use different channels for synchronous, asynchronous calls, and asynchronous events (where the server pushes events to the client) as required.

Questions about the High Throughput Math Functions

Hello,
i am just trying to unterstand what advandages the High Troughput Math Funtions have. So i have to ask some Questions.
I always talk about beeing inside a SCTL.
1. In the Image you can see four Add Functions. One with U32 which should use more Ressouces than the one with the U16Datatype which uses mor than the U8. But does my FXP High Througput Math Function use less Ressources than the U8 Version?
2. Which of this four Add Functions will take less time for Execution?
3. If I would Add two 32bit Numbers one with the normal Add and one with the High Throughput Add. Which of the Functions will use less Resources and which will be faster?
4. How would it bee if i had a Multiplication instead? When i unterstand the concept of a Multiplication right it will be done with a DSP48E. This Logic Block is able of Multiplying a 25bit Number with a 18bit Number. So the U32 Multiply will use 2 DSP48Es and the other three Functions would use one DSP48E.
I guess the U32 Version will have the slowest Execution?
Whats about the other three will their execution speed be equal or will the Versions with smaller Datatypes be faster?
With kind regards
Westgate
Solved!
Go to Solution.

I don't see a big rush to answer this, so I'll give it a shot:
1. The HT version uses less resources, but only because it is configured with the smallest data types. You should get exactly the same results with the same data types and an Add function. The only difference with the HT version is the ability to specify an output register, and the handshaking signals that account for that delay. IF the add is implemented in a DSP48, the integrated register can result in better timing, but in practice it is usually equivalent to an Add function followed by a feedback node.
2. The actual delay through an add is proportional to the number of bits, where the critical path is the sequentially computed carry chain. So you could run the last one at the highest clock rate. The FPGA has dedicated fast carry logic, so the difference isn't too significant.
3. The first one will be VERY slightly smaller and faster, just because you're computing one extra output bit on the second one.
4. I would expect the speed to depend only on the number of DSP48s used, so the last 3 should be similar. You'd be likely to see different results in practice, though, due to routing differing numbers of bits to registers for the indicators. This assumes you're not taking advantage of any of the pipelining configuration options in the HT Multiply. Those options, and the associated handshaking signals, are really what differentiates the HT versions from the regular numeric functions. They allow you to achieve higher clock rates and throughput at the expense of latency (ie, it will take more clock cycles to produce a valid result but you can get more data through the function in a given amount of time).
Caveats: All your examples have constant inputs, so the LabVIEW compiler and/or Xilinx tools can and will optimize them to no ops. Small multiplies, multiplies with one constant input, or those just larger than 25x18 may also use some non-DSP48 logic for all or part of the implementation. Note that the HT palettes provide a DSP48E function in case you want control over exactly how a multiply and/or add gets implemented. Placing and routing can result in unexpected behaviors, so estimating timing is much more difficult than simply adding up component delays.

High Throughput Complex Multiply Implementation resource

There are two option in the implementation resource of a High Throughput Complex Multiply block in LabVIEW FPGA (Auto and Look-up Table). What is the difference between the two and what is the advantage of one over the other? which consumes less fpga resource?

nmbernardo,
Auto specifies that the compiler decides whether to use embedded block multipliers or look-up tables (LUTs) to implement the multiplier. Whereas Look-Up Table specifies that this function uses LUTs to implement the multiplier. Selecting this option might increase the clock rate at which this function can compile.

Fpga high throughput multipliers

When should you use these multipliers rather than ordinary multiplies? Is there any drawbacks if using these ie do they take more gates to implement?
What about high throughput additions?
Thanks
Solved!
Go to Solution.

You might find this link from the help information: Using the High Throughput Math Functions
In particular: "National Instruments recommends that you use the LabVIEW Numeric functions unless you need the benefits that the High Throughput Math functions provide." I don't know if there's any penalty for using them when not needed. For additions and multiplies, the major advantage to the high-throughput operations is that you can more easily control pipelining when executing a series of mathematical operations. (EDIT: but if you don't need to pipeline mathematical operations, then there's no benefit.)

Very high contrast or inverted colors during scree...

I've had this same phone on my 1st 1520 and this is repeating on the new replacement as well.. so far I've seen it twice in the last 10 days, latest 15mins ago. When I unlocked the screen, the lock screen goes all weird.
It was like high contrast or the colors were inverted? I tried to take a screen shot, but it corrected itself before I could get the screen shot off.
Anyone else have or had this issue? Something to be concerned about?
I've seen similar post here - http://forums.wpcentral.com/nokia-lumia-928/253914-very-high-contrast-screen.html I'm experiencing the same kind of screen.
Attachments:
1520Screen_Glith.jpg ‏51 KB

Hay guys I've gone throw 5 of the Nokia Lumia 1520 the cyan have a problem that att will not address or give a recall for because not enough of us are experiencing the problem we are it either your graphic card or your screen I will let yall no no matter what replacement cyan you turn in they will not admit to there faulty device the cyan Nokia Lumia 1520 they all do it no matter what even the brand new ones your get from the store I will let you know as well that eventually the screen rolls bleaches and inverts in the end. My phone works fine by the way the colors are just messed up.they will work fine if you can deal with your near 700 dollar phones screen looking funny then by all means lets not complain to Nokia about there phone. Unfortunately att doesn't make the phone they sell it so tell Nokia about the cyan problem and we should have our problems resolved

High throughput on access points

Hi,
AP's today support 802.11n ( upto 300Mbps ) throughput. Is there any specific configuration to enable 802.11n to be used as backend link for the 802.11a usual links.
or does the ap auto negotiates. What factors determine if the AP will use 802.11n as its backhaul link?
Thanks in advance!

"Factors that determine which data rate is , the power of the device's Wi-Fi radio transmitter/receiver"
This means that client devices can come with various TX power. If you look at the data sheet for client wireless cards, you might see various different output power. For example, 100mW, 50mW and or 25mW. There are even some in the 200-300mW.
When you implement a wireless infrastructure, you want to match the power of the lowest client you have. Many access points can be configured to the max for the country your in (allowed by regulation). So if your access point can achieve 100mW but you have clients that are 50mW, then you may want to lower the power of the access point to 50mW.
Power or TX power on the access point will affect the coverage area, as the higher the power, the more the coverage. If you over power the clients, then the clients can hear the AP, but the AP might not hear the client. That's why its important to adjust the power on the AP to your lowest client.
Thanks,
Scott
Help out other by using the rating system and marking answered questions as "Answered"

Skills/Knowledge needed for low latency/high throughput development

I have a meeting/technical interview next week for a department at work (foreign exchange). It is going to be a technical interview. I know from the team that they use java heavily, using jvm optimisation, modifications to garbage collection, and have servers placed as close to the exchanges as possible to minimise latency. They said they need to work in "micro seconds" opposed to milliseconds and this means being as efficient as possible.
I love java development, but am relatively inexperienced but i really want this job. What would you suggest needs to be researched for a role like this in order to stand the best chance of getting it. I dont think knowing about inheritance, auto boxing/unboxing is going to help much in this?
I am thinking potentially looking at new releases to the java platform such as closures to demonstrate i am keeping up to date with current trends but as for the rest of it i am not really sure.
I would really appreciate some pointers around considerations for low latency / high volume / highly concurrent development in java if possible.
Just for a little more detail, the backend uses KDB database with a java fx front end
Thanks

ea33334c-b8a8-437b-9807-a170194a1950 wrote:
it is part of my graduate placement. i have to do a rotation to a new team. i hope you were only so blunt because i fogot to mention this ?
How is any of what you just said relevant? I was 'so blunt' because you seem to be setting yourself (and your potential new teammates) up for failure. Based on what you posted you are nowhere near qualified for the task you described.
Further there is absolutely NOTHING in what you posted that talks about any skills that you DO have. You didn't mention A SINGLE THING about your skillset or how you might add value to that team or project.
Your educational experience should provide some guidelines for how you advance your skills in ANY subject. Each college course has prerequisites and for good reason. Taking a course when you don't have the proper foundation and background is the surest way to fail. Colleges don't let you do it. You have likely been in classes where some of your classmates were clearly in over their head. For those people that course will be nothing but headache and heartache and their chances of success are minimal.
It is the same with most endeavors including the one you mention in your thread. Naturally you want to challenge yourself when you join a new project or team but you have to be able to hold your own and contribute. Taking on a task or project when you don't have the necessary experience will not only subject you to unnecessary problems but you can easily have a large negative impact on the entire team and project.
I suggest you try to find a different project where whatever (still unknown to us) skills you have can be used to contribute to the team effort. No one expects new team members to know everything or as much as more experienced developers but they also don't want an 'anchor' that is going to drag them down.

High throughput design guide?

Is anyone aware of whether there is a Unified Wireless DG focusing on maximizing data access speed for mixed clients?
I've been searching for something I could FWD with things such as allowed rates, AP placement, a/b/g radio configuration, etc. but did not turn up much.
Thanks,
Erik

That's a complicated question. Take a look at this post from a few days a go that discusses a number of options...
http://forum.cisco.com/eforum/servlet/NetProf?page=netprof&forum=Wireless%20-%20Mobility&topic=General&CommCmd=MB%3Fcmd%3Ddisplay_location%26location%3D.2cbea1a5

What kind of throughput should I expect? Anyone using AQ in high volume?

Hi,
I am working with AQ in a 10.2 environment and have been doing some testing with AQ. What I have is a very simple Queue with 1 queue table. The queue table structure is:
id number
message varchar(256)
message_date date
I have not done anything special with storage paramteres, etc so it's all defalt at this point. The I created a stored procedure that will generate messages given message text and number of times to loop. When I run this procedure with 10,000 iterations it runs in 15 seconds (if I commit all messages at the end) and 24 seconds if I commit after each message (probabliy more realistic).
Now, on the same database I have a straight table that contains one column (message varchar(256)). I have also created a similiar storage procedure to insert into it. For this, 10,000 inserts takes about 1 second.
As you can see there is an order of magnitude of difference so I am looking to see if others have been able to achieve higher throughput than 500-700 messages per second and if so what was done to achieve it.
Thanks in advance,
Bill

Yes, I have seen it. My testing so far hasn't even gotten to the point of concurrent enqueue/dequeue. So far I have focused on enqueue time and it is dramatically slower than a plain old database table. That link also discussed mutliple indexed organized tables being created behind the scenes. I'm guessing that the 15X factor I am seeing is because of 4 underlying tables, plus they are indexed organized which adds additional overhead.
So my question remains - Is anyone using AQ for high volume processing? I suppose I could create a bunch of queues. However, that will create additional management on my side which is what I was trying to avoid by using AQ in the first place.
Can one queue be served by multiple queue tables? Can queue tables be partitioned? I would like to minimize the number of queue so that the dequeue processes don't have to contain multiplexed logic.
Thanks

Low udp tx throughput

Hi, gurus
I've a quiet cool Sun x4600 boxes on 8x2 AMDs
less /etc/release Solaris 10 8/07 s10x_u4wos_12b X86
Assembled 16 August 2007
So, I'm testing 1 GBe network of these cluster (e1000g driver) by netperf.
Symptoms are:
- run 1 udp send test (UDP_STREAM, one thread, 1472 bytes, 64K buffers ) got less that 100 Mbit/s (instead 1000!)
- run 1 tcp sender (similar params) - got about 900 MBits (good)
udp test achieves about 0.4% of CPUs, at contrast tcp sender got 1.2% (stupidly checked by prstat)
Also, udp sender increases "Tx Desc Insufficient" kstat e1000g statistic counter at 5,000 due sending 1,000,000 datagrams due a minute.
Pair runs of these gets more exciting results:
two tcp senders - got high throughput both ( more than 400Mbit/s ) and cpu;
two udp got low throughput (40 Mbit/s) and low cpu utilization;
but
When tcp and udp senders runs simultaneously - it looks like udp sender is "cpu vampire" or "tcp steroids for udp send":
if i've run udp sender in a short period due long running tcp send - tcp sender, which had started quiet good, become degraded after start udp-s, and return to GB throughput after udp-s finished. Due this short run udp-s achieves quite good throughput!!!
Vice versa, starting udp-s for a long time, gets low throughput and cpu-util, short run of tcp-s increase udp's throughout (only due tcp-sender run), and its cpu-util, in contrast tcp-sender got low thrp.... damn
In two words: udp-s equally divides low throughput, tcp - eq divides high troughput, but udp sender completely wins tcp-sender in a cpu contention - gets a high throughput and cpu util and tcp-send process degrades to the low values.
What's the reason for this? How I can get out of this issue for my app in production? Is it a feature or a bug?
Thanks for attention
Michael
PS. It seems that my network is fine (1 GBe , appropriate throughput) I just have problem with putting data into the NIC.

Sorry for the momentary hijack. I thought "Mbps" was the same as "MB/s." What's the difference?

High performance (low level) jdbc access routines

Does anyone know of any low level jdbc access routines?
My need is to be able to do high speed data access. I'm looking for something 2 orders of magnitude faster than what is available with the typical use of java.sql (connection, preparedstatement, resultset and so on).
Several years ago while using visual basic and odbc, I had to use direct calls to the odbc32.dll functions for data access to get the performance I wanted rather than the ado and rdo stuff. I am wondering what is available with jdbc - perhaps the routines that are used when writing jdbc drivers themselves. I don't know if there is an equivalent "gateway" that all jdbc drivers use (like the odbc32.dll that is used with all odbc drivers in windows)
Any comments would be appreciated. Even just hints as to where I might find this kind of information.

Does anyone know of any low level jdbc access routines? What are you expecting besides what is provided in the java.sql interfaces?
The JDBC spec spells out what the interfaces are. There isn't anything else to call.
>
My need is to be able to do high speed data access.
I'm looking for something 2 orders of magnitude
faster than what is available with the typical use
of java.sql (connection, preparedstatement, resultset
and so on).Faster than what driver talking to what database over what network?
Several years ago while using visual basic and odbc,
I had to use direct calls to the odbc32.dll functions
for data access to get the performance I wanted
rather than the ado and rdo stuff. I am wondering
what is available with jdbc - perhaps the routines
that are used when writing jdbc drivers themselves.You'll have to write your own driver.
I don't know if there is an equivalent "gateway"
" that all jdbc drivers use (like the odbc32.dll that
is used with all odbc drivers in windows)Since Java's platform-independent, there's no easy out like taking advantage of OS calls.
Any comments would be appreciated. Even just hints
as to where I might find this kind of information.Sounds to me like you want to write your own driver, and when you're done it will only be good for one database and operating system.
I'd have to ask what made you so certain that database calls were the bottleneck in your app. For the typical Web app, a reasonably well written persistence layer will be fast enough. There's other network latency in the system, and UI response can be slow enough to keep up. Are you worried about high throughput to the database for a Web app? If so, you might be guilty of premature optimization.
Unless I was absolutely certain that the JDBC driver from my database vendor would not do the job, I'd write the app and then profile it to find out where the performance bottlenecks were. I wouldn't take this extreme step until I was certain that the driver was the problem and a custom version would fix it.
%

Question on replication/high availability designs

We're currently trying to work out a design for a high-availability system using Oracle 9i release 2. Having gone through some of the Oracle whitepapers, it appears that the ideal architecture involves setting up 2 RAC sites using Dataguard to synchronize the data. However, due to time and financial constraints, we are only allowed to have 2 servers for hosting the databases, which are geographically separate from each other in prevention of natural disasters. Our app servers will use JDBC pools to connect to the databases.
Our goal is to have both databases be the mirror image of each other at any given time, and the database must be working 24/7. We do have a primary and a secondary distinction between the two, so if the primary fails, we would like the secondary database to take over the tasks as needed.
The ability to query existing data is mission critical. The ability to write/update the database is less important, however we do need the secondary to be able to process data input/updates when primary is down for a prolonged period of time, and have the ability to synchronize back with the primary site when it is back up again.
My question now is which replication technology should we try to implement? I've looked into both Oracle Advanced Replication and Dataguard, each seems to have its own advantages and drawbacks:
Replication - can easily switch between the two databases using multimaster implementation, however data recovery/synchronization may be difficult in case of failure, and possibly will lose data (pending implementation). There has been a few posts in this forum that suggested that replication should not really be considered as an option for high availability, why is that?
Dataguard - zero data loss in failover/switchover, however manual intervention is required to initiate failover/switchover. Once the primary site fails over to the standby, the standby becomes the primary until DBA manually goes back in and switch the roles. In Oracle 10g release 2, seems that automatic failover is achieved through the use of an extra observer piece. There does not seem to be anyway to do this in Oracle 9i release 2.
Being new to the implementation of high-availability systems, I am at somewhat of a loss at this point. Both implementations seem to be a possible candidate, but we will need to sacrifice some efforts for both of them also. Would anyone shine some light on this, maybe point out my misconceptions with Advanced Replication and Dataguard, and/or suggest a better architecture/technology to use? Any input is greatly appreciated, thanks in advance.
Sincerely,
Peter Tung

Hi,
It sounds as if you're talking about the DB_TXN_NOSYNC flag, rather than DB_NOSYNC.
You mention that in general, you lose uncommitted transactions on system failure. I think what you mean is that you may lose some committed transactions on system failure. This is correct.
It is also correct that if you use replication you can arrange to have clients have a copy of all committed transactions, so that if the master fails (and enough clients do not fail, of course) then the clients still have the transaction data, even when using DB_TXN_NOSYNC.
This is a very common usage scenario for Berkeley DB replication/HA, used to achieve high throughput. You will want to pay attention to the configured ack policy, group size setting, setting of the 2SITE_STRICT option (if group size == 2).

High Throughput Divide Inverted?

Similar Messages

Maybe you are looking for