Distributed computing theory

All else being equal, and as simple as possible, what would be a good solution to this problem. A company wanted to set up an e-commerce website(tomcat). But the main db is located at main headquarrters for internal use only. The website would be located off site at a server farm. What is the best way to share data between the two. Would this be a proper solution:
SQL database located at the main HQ along with a J2EE server such as JBOSS(Is this what is used for?). The website does not connecct with any SQL db at the off site location. Instead it connects with the JBOSS server at the headquarters. Is this a widely used method considering that if the connection went down between HQ and the off-site location, the external e-commerce website would not work? In addition, what type of connection between the two. I know that there are many solutions but I would like to hear a sugestion or two so I can research this type of problem better.
thanks

All else being equal, and as simple as possible,
what would be a good solution to this problem. A
A company wanted to set up an e-commerce
website(tomcat). But the main db is located at main
headquarrters for internal use only. The website would
be located off site at a server farm. What is the best
way to share data between the two. Would this be a
proper solution:
SQL database located at the main HQ along with a J2EE
server such as JBOSS(Is this what is used for?). JBoss is an open source J2EE app server, of course. I have no idea what your HQ site is using it for.
The website does not connecct with any SQL db at the off
site location. Instead it connects with the JBOSS
server at the headquarters. I don't think this would be a good way to go:
(1) I'd imagine that the extra network latency would be a terrible performance hit.
(2) Read-only data might be okay to access this way, but when you talk about an e-commerce Web site that implies that the database will be modified continuously as long as the site is up.
(3) I think this sounds like an insecure arrangement. I don't think I'd want to do business with that e-commerce site.
Is this a widely used
method considering that if the connection went down
between HQ and the off-site location, the external
e-commerce website would not work? That's another problem, of course. Now you have 2 points of failure instead of just one.
In addition, what
type of connection between the two. I know that there
are many solutions but I would like to hear a
sugestion or two so I can research this type of
problem better.I think it sounds more like a synchronization problem. Unless HQ absolutely HAS to have up-to-the-minute data from the e-commerce Web site I'd suggest something like this:
(1) Synchronize the two databases at the start of business using HQ as the master and e-commerce as the slave,
(2) Allow e-commerce to use its local database during business hours to transact business,
(3) Upload data from e-commerce back to HQ, synchronizing at the end of business hours.
All this should happen over a secure, reliable connection between the two sites, NOT the Internet. It'd be best to use mirroring s'ware provided by your database vendor.
The only problem is the definition of "business hours". If your e-commerce site is like Amazon.com, there IS no close of the business day. You'll have to think through that syncronization carefully.
But if you can get away with synching round trip once per day, this is what I'd do.

Similar Messages

Dataware house with Oracle or distributed computing

Hi All,
When I talked with some guys in some big companies on the DW (ETL especially), all of them said they love distributing computing with Hadoop or Hive much more than Oracle.
When they have huge data per day for processing (say n TB), Oracle or rational database can't work very well.
I just have one DW project experience, which was implemented with PL/SQL and Shell purely and works well, at least from my point of view.
What's your opinion on this?
Thank you very much,
Leon

Hi Leon,
look at this page (it contains link to two publications with results of comparison Hadoop against 2 RDBMS)
http://database.cs.brown.edu/projects/mapreduce-vs-dbms/
It seems Hadoop currently has no any chance against RDBMS (in DWH area)...
In my opnion, Hadoop/Hive is a technology and not a solution, to solve problem with Hadoop/Hive you will need to do a lot of work.
Regards,
Oleg

Urgent Help on Prog. requirements on OS for Distributing Computing

Dear Sir/Maa'm,
Anyone of you can please identify the program/application requirements on the OS Unix, Linux for using Database management and also find out same for Distributed Computing?
Also can you demostrate at least 2 arithmatic features and Auto Text features of MS Word/ I will be very thankful to you.
If you don't understand my question please contact me via e-mail, I'm these details urgently.
My e-mail address: [email protected] or post reply on SDN.
Thanks and God bless you.
Waiting for your response.

Double post:
http://forum.java.sun.com/thread.jspa?threadID=784930

UCS-B for distributed computing application

Hi,
What will be the best server configuration UCS-B for under centOs for a distributed computing application through a clustering tool like SGE or LSF
This software is a biological application and the customers want 400 core in each blade.
I need just a first assessment.
Regards

What's wrong with that code is that it's not posted using code tags.
What's wrong with your thread here http://forum.java.sun.com/thread.jspa?threadID=5280844 ?

Distributed computing and enterprise computing

can you explain me the difference between distributed computing and enterprise computing

Enterprise computing contains solutions that revolve around a comprehensive
set of IT methodologies, that concentrate on identifying, measuring, improving
the ability of technology to deliver against the business value-chain.
Distributed computing is the process of aggregating the power of several
computing entities to collaboratively run a single computational task in a
transparent and coherent way, so that they appear as a single, centralized
system.
My company is working on a product which is an enterise-wide distributed
computing solution.
It is called as GemFire enterprise. (GFE)
Consider taking a look at http://www.gemstone.com/products/gemfire

Why do we need to distributed computing?

Hi all,
Why do we need distribute objects and what are the major advantages that this approach offer us?
Thanks.

Look up on the web; the 10 misconceptions of distributed programming; I haven't got time to find it in my data now. But it still fails to mention this one and only utterly important question; how many CPU cycles must a process miss before it becomes affordable to sacrifice even more CPU cycles to set up a TCP/IP connection to marshall every aspect of your call, send it to the other side and process the response in the reverse order ? Answer: a whole lot of CPU cycles. If the time it took to process the thing locally is smaller than the time it takes to set up the whole thing to work accross a network, then forget about it completely. You can't beat it. Not to mention that if your front-end is so busy that it needs a back-end to do some things, then the bandwidth it's going to need is usually so big, that the boxes should normally not be more than a slight distance apart, so there goes the argument for using it to address specialized services (which are usually not around the corner).

Disconnected Distributed Computing in Java

Hi, the organization I work for is heavily into JSP/Servlet applications. However, there are a large number of computers which have very slow network links, so they want to find some technology in which we can develop applications which are disconnected, then do a mass synchronization with a Servlet every hour or evening etc.
Is there a Java technology which:
1. application source is centralized and downloadable to client
2. allow for disconnected use of the application
3. synchronize with the server when the application becomes connected with the server
thanks
Don

Jini, is for application to application communication over hetrogenous networks and object protocols. It does not manage distribution of application or synchronization of application data for limited connection applications.
Don

Problem on distributed computing with java..?

My problem is:
I have one server and three clients.Firstly 3 clients sends data simultaneously to the server and should be accessed by the server at the same instant.
Server does some calculations using this data and should broadcast the results to clients.The clients after sending the data to server are engaged with their own work.After the completion of their work they should receive the data sent by the server.
Please suggest the solution.
And also which package/methods should I use for broadcasting Data to clients....

Sockets and threads?

Distributed or grid computing question

Hi all.
I'm looking to upgrade our studio to a Mac-based system, but that leaves me with several Windows computers hanging around doing nothing. I'd find it a lot easier to justify this if they could be re-used with Compressor's distributed computing feature, so that we can process our video much faster, with less initial investment (I'd not need a 12-core Mac Pro if I could use all our dual core PCs to speed up the job)
I know that there is software out there to allow Windows PCs to act as xGrid drones, so I was wondering if anyone knew about anything similar that allows Windows machines (these happen to be running XP) to take some of the load of processing video with compressor.
Thanks for reading.
Andy.

Thanks, Jon.
I guess that Qmaster is what Compressor uses to do it's grid thing on Macs?
What I was asking for was if there was some kind of FOSS software that does the same thing on Windows?
Andy.

Advise on Distributec Computing and Java

Hello there all,
I am MS Software Student and has taken project (2.5 month) on 'Distributed Computing and/in Java". It's a research project with some coding. I know CoreJava and some part in Advanced Java like JSP, RMI, Servlet etc.
I know know much about Distributed Computing. So, how do I start. One of my friend working in Java said it's good area for research and especially with Java and Distributed Computing is the best to choose one.
So, I would like to know your all's opinion about the topic and also some guidlines from where to start. I have some idea about it:
1. Learn/through with Core Java.
2. Learn Advanced Java like RMI, JSP, Servlet and others.
3. Go for Distributed 'features' that Java has.
4. Make some small 'programm'/'utility' that demonstrate Java's capability in Distributed Computing World.
This is my plan.
Any suggestion is heartly welcome.
Thanks,
Nirav Patel

Hey Nirav,
Great to get such a question. Even I have done a Project on this topic for my UG. "Distributed Computing for Java"
but it is a comparatively smaller one without using RMI. but I think, you expect more. My experience is, getting help regarding Distributed computing is not easy. but in case, you have specific queries, we will get really good solutions here at forums.
give your personal id in case you want me to contact you

Understanding cluster computing

Ok so I have written this program in python2.7 that runs on a desktop machine running Arch Linux. Without getting into the big details, this program takes a data-point, performs a series of mathematical manipulations, and each Process [from multiprocessing.Process] first creates what needs to be written to the database, places them in queues, takes them out in the main process, then places them into into new processes that write them to the database.
Everything is working but the new issue is the time, the number of calculations has gotten so high that a single point will take somewhere between 7 and 8 seconds to complete before it can grab a new entry. I know this doesn't sound like a lot of time but for this environment it is significant, truthfully I would like everything to be done in about .1 to .5 a second, question being, how do I get there?
I don't know anything about the world of super computing or high performance computing and there seems to be several ways to implement it, but for different reasons. I have an idea of what I do not want, I don't want to adjust the program to take on distributive computing at the application layer, I think this is an example of what I don't want.
I have a slight idea of what I do want:
1) Build a server rack, centered around a motherboard like this and components that work with it.
2) Have the OS only on the master of the cluster, basically I don't want a distribution day where I am stuck updating the distribution of each motherboard/CPU(s) combo in the cluster.
3) I like arch linux but am not married to it, so anything that will work will do.
4) That I can ssh in, do whatever commands are necessary and then just ./run.py the application and the OS, or whatever, handles the distribution of the workload across all the nodes in the cluster without the application needing any additional coding for it to make use of all that power (outside of the already used multiprocessing.Process).
I haven't read anything specific on the Arch Linux site about the OS doing this, I have read this about Gentoo and this other software that I think is its own OS called Rock. Am I on the right track looking at these two pieces of software? Anyone have anything to say about them, why one would be better suited to my purpose than the other? Or another well documented tutorial I could follow? Anyone every done anything like this already and knows exactly what software and configuration I should use? Thank you for your time.

Just to echo what jakobcreutzfeldt said, before you spend thousands of dollars on more hardware, work on optimizing the code and rewriting in C (or Fortran as suggested above - I think C has some practicality advantages without sacrificing performance, but that's another conversation).
Even if you don't know C, or don't know it well enough, you can definitely hire someone to revise/rewrite the code for much cheaper than what you'd pay for all that hardware.
Or ... use the forums and archers will compete against each other for the fastest implementation of the code for free. Apparently we'll even test the code, and generate performance reports - all for free. Of course I'm being mildly facetious as the code for your project may be much more complicated - but nonetheless it sonuds like a project many programmers would enjoy, so contracting someone shouldn't be hard.
Last edited by Trilby (2014-05-15 11:27:44)

TestPlant for Distributed Aplication inForte/Condutor

Hi, Forte Gurus:
We're in the begginner of our first big Forte project and the issue has
arisen for package testing.
By "package" we mean a collection of classes which provide some function
or service
or just logically belong together.
The tools are Forte 3.x, IBC Reflection Framework , Conductor ,
Gui with tree view explorer stlyle and later Java/Corba, etc.
If package X is under test and needs package Y, X's tester often
cannot use the real Y. He will need to stub out Y in places to provide
the inputs his test needs, Does anyone have a good testing strategy?
If this were C++, we could just link with the stubs rather than the real
deal. But because of the repository, we're forced to copy things back
and
forth from plans to avoid linking with the real classes.
So far, we've come up with the idea of creating a test plan and then
taking copies of everything needed. But that's kind of ugly: for one
thing, you need to keep making your changes in the original plan and
importing them into the test plan, because if you do it the other way
around you tend to break inheritance (at least, at the class level - I
suppose you could copy individual changed methods). Another idea is to
make copies of all of our plans which hold versions of the classes, then
have them supply the test plan rather than the normal plans. That way
people can share stubs. If there's some common, elegant way to do this
I'd like to hear about it.
Please , anyone have a good testing strategy, or similar plan in other
distributed plataform ?
Good links about?
Thanks in advance
Best Regards
Ricardo J.P. Baraldi
Senior Consulting
Distributed Computing Division
International Business Corporation
(703) 691-0400 Ext 118
Visit us at: www.ibcweb.com
To unsubscribe, email '[email protected]' with
'unsubscribe forte-users' as the body of the message.
Searchable thread archive <URL:http://pinehurst.sageit.com/listarchive/>

What is Grid and Cluster computing means ?

Hi ,
The following terminology is very confusing. Could you please explain. What is this means ? and how they are different from each othere, etc.
- Cluster computing-
- Grid computing
- Parallel computing
- Distributed computing
- supercomputers or conventional supercomputers
Thanks
Siva

Hans has already given an excellent reply. So I shall be just echoing the same and since it's quite early for me in the morning and it's not a weekend for me as well so my reply would be limited.
Grid Computing: What we mean by multiple administrative domains ?As you might have read in the link given by me and also on many other links on the web, there is no defined thing like this available. It's a concept. When in 10g, this concpet finally came, it was completely not possible to define except saying this that we are going to have many servers combined which can do the work for us as like one and we don't need to know which one is working for us and which one note. I remember an example given by HJR, electricity hubs. As being a consumer, we just need to bother about this that when we switch on the tubelight, it should be on. We are really not concerned that from which transformer, the electricity is going to actually come to the tube light. And that's exactly what Grid Computing is , you need a resource, you would get it, from where, you are not supposed to know. And looking at this aspect, this is not something which started happening from 10g only except this that a formal marketing name, Grid Computing , came into the picture. Oracle's RAC(earlier called OPS) is exaclty doing the same since version 6-presenting a multi-system environment as one to you. Now, there is another concept added in it called administrative domains. Again, this IMO, is a term and with the Grid Comptuing concept getting matured, it's implementations are also becoming clear. Like in 11.2 RAC, we have got a concept of Server Pools which means , in a cluster of 4 nodes, you can divide your OLTP work on Node 1 and 2, DWH on Node 3,4. And in these server pools, you can add and remove the nodes on the fly and pools would take care of the things on their own. Now, since we have managed to seggregate the roles of operations in two different set of nodes which are still part of one single cluster, we can manage things differently. And I think that's what can be called(loosely) Administrative Domains. Again, this is just one of the many examples I think as Hasn has mentioned, the definitions would really stand correct in the right context where they are used. And it's also possible that I might have given a completely wrong picture to you as well.
Is Clustering & Distributed computing, are the same ? Multiple computers are involved
Clustering & Distributed computing to compute the data, then what is the difference. ?Well, no, that's not true. Clustering is combining computing power of many computers as one and use it to do "divide and rule" . And Distributed computing is where you actually do the same work divided across many machines. For example, if you would do a transaction and if it spans across many machines and when you would commit, if all would say that they are okay with what you have done, it's Distributed computing. Just many computers are there in both doesn't make them same.
3) There is no end to this thing I guess. We are in an environment where needs, demands are growing constantly. There is no such thing that what is large, fast anymore. What you call a supercompyter today, tomorrow it may work as a normal machine. Things are becoming faster, smaller and the thirst for more is just growing . I don't think that it's something that you should be bothered much about. In the last 20years or so, there is a constant evolution happening in technology and I can't think myself where this would end and what will be the final outcome of this rat-race of "faster, larger" .
Hope it does makes some sense and helps you.
Aman....

Hack my security!

I am in the process of creating a web application that talks to a servlet, and I need a reasonable way to allow only my code to run client-side. In other words, I want the servlet to be able to guarantee that the client is running my code (let's call the class "Benign"), not malicious code. If you know a bit of computing theory, you'll realize that this is impossible ("Any system can emulate any other system"). But I've made a pretty nasty Catch-22 that I think just might work. Here it is:
The appl(et/ication) is loaded, and establishes a connection. It creates an instance of Benign, and sends it through the connection.
If the servlet gets a LinkageError or somesuch message, the received class is different, and is therefore rejected.
Good so far.
But what if the client then creates an instance of type Malignant and uses that instead?
Here's the catch. In the initialization, Benign creates a connection to the server, and notifies it of where the applet is running. If it is not allowed to do so, it throws an Error or Exception, interrupting its own creation.
Benign sends a copy of itself to the servlet.
The servlet now knows that there is an instance of Benign running on the client!
My only concern is that someone might send a copy of Benign's bytecode to the servlet instead of a real Benign. It wouldn't be easy (therefore this starts to fall under the realm of unreasonable), but it could be done.
That's a summary of how the verification works, but I know it can be broken. And I'd rather be surprised sooner than later. So any suggestions on how to hack it (within reason)? I need to know this, so I can add one more level of verification. Anybody who can crack that DESERVES to run their own code!
10 Dukes to be distributed.

... What's to keep
malicious code from just sending a copy of benign to
the server and doing whatever the hell it wants?Because that copy of Benign is talking to the server as well. And the server will only accept data that comes through Benign. This is pretty tricky, but if you stare at it long enough it becomes clear. BTW, I came up with this idea at 2:00 a.m., so it took me a while to understand it. The applet can't send an initialized instance of Benign to the server until Benign has been created...and in doing so, Benign sends itself to the server, too! Got it?
To answer the other question (from kleink):
+ The goal is to verify that my code is running on their computer. Sort of like, I'm providing a service, and they'd better let my code run in return (instead of a mockup)

J2EE Design Questions

Please forgive me, I am just getting into J2EE and, despite having a mountain of manuals on my desk, I'm having a little trouble figuring out how everything fits together...
Question 1.
Is there any standard application entry point for an EJB container... kind of like main() in a plain old java application, or the contextInitialized() method for a ServletContextListener in a web container?
Question 2.
Is there any kind of an analog to web.xml for EJB containers? If not, is there some kind of standard way to set initialization parameters for EJBs?
Question 3.
Is there any kind of JNDI service provider created for me by my EJB container? For example, the (rather old) article here:
http://java.sun.com/products/jndi/tutorial/getStarted/examples/naming.html
suggests the following code:
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY,
    "com.sun.jndi.fscontext.RefFSContextFactory");
Context ctx = new InitialContext(env);specifically 'Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.fscontext.RefFSContextFactory"' looks like a bunch of voodoo to me, and also mentions something about the file system... I don't want my JNDI to have anything to do with my filesystem. Do I need to install some kind of JNDI server or LDAP server or something? Sorry... I'm really confused.
Question 4.
In my test application, I'm trying to centralize logging. I have a stateless EJB that implements logging capability. When instantiated, (in theory) it should register it's logging element to JNDI, so that further instantiations of the logging EJB should all direct their calls to the same element.
I fully realize that this would imply (at best), in a distributed environment, the rather strange behavior of logging calls bouncing around to the first machine that happend to instantiate a logging bean...
regardless, I have the following broken code:
import javax.ejb.Stateless;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
@Stateless
public class LogEJB implements VCMarkWeb.log.LogEJBRemote {
    private transient Context context;
    private transient Log     log;
    private transient boolean logMaster = false;
    private int logLevel;
    public
    LogEJB() {
        try {
            context = new InitialContext();
            Log     globLog = (Log) context.lookup("log");
            if (log != null){
                this.log = globLog;
                return;
            log = new LogWindow(800, 600, "LogBean", 5);
            log.out("Logging born.");
            context.bind("log", log);
            this.logMaster = true;
        catch (Exception exception) {
            System.out.println("LogEJB: Exception while instantiating.");
            System.out.println(exception.getMessage());
    public void
    finalize() {
        if (!logMaster)
            return;
        try {
            context.unbind("log");
        catch (Exception exception) {
            System.out.println("LogEJB.finalize(): Exception unbinding log.");
            System.out.println(exception.getMessage());
} Obviously, this code is broken, because my EJB has overridden finalize(), which is apparently not allowed. In this case (since I'm going to have to delete that finalize() method), how could I assure that my reference to a logging element will be unbound once the owner of that logging element is garbage-collected?
..or.. if I'm asking Question #5 in too convoluted of a way...
Question 5'. Say, when my EJB container is kicked off, I want to create a single class containing a swing window, which will exist for the lifetime of the container, to which I can pass a handle to, such that any EJB's can invoke its methods? How would I do that, and how would I pass to my EJBs a handle to said class the 'right way'(tm)?
Thanks so much for your assistance, J2EE design gurus :)

singletons are counter to the very concept of distributed computing, which EJB is of course liable to do.
So while you may have a class that behaves like a singleton inside its own enterprise application context, there is no guarantee it will behave like a singleton when seen from the perspective of a client.
In fact, when the enterprise application is run in a distributed environment it's guaranteed to not be a singleton anymore as each JVM instance will have its own instance of the class loaded.
It can get even worse depending on application server architecture, and you may end up having several instances of your "singleton" running inside the same server (possibly the same application) though that is more rare.
So the best thing to do with singletons is forget all about them.

Distributed computing theory

Similar Messages

Maybe you are looking for