Suffix tree clustering algorithm?

I am studying in my final year of computer science and i need to get hold of a suffix tree clustering algorithm that can be used in javascript. I have searched all over the net but have come up with nothing and was wondering if anybody could help!

In the section, "Longest Repeated Substring", it says, "the longest repeated substring of txt[1..n] is indicated by the deepest fork node in the suffix tree, where depth is measured by the number of characters traversed from the root".
However, it doesn't seem to be always true(hmm...probably I interpret it incorrectly).
Consider the original string = abb$
(with index 1234)
the suffix tries looks like
bb$
root--------------
|
| abb$
|----------
the suffix tree looks like
(2,4)
root-------------
|
| (1,4)$
|---------
However, there isn't even a fork node.
But obviously, b is the longest repeated string.
Another example
Consider the original string = abc$
(with index 1234)
the suffix tries looks like
c$
root--------------
|
| bc$
|----------
|
|
| abc$
|----------
the suffix tree looks like
(3,4)
root--------------
|
| (2,4)
|----------
|
|
| (1,4)
|----------
However, there isn't even a fork node either. But in this case, there is no repated string.
Hope anyone can answer this. Thanks.

Similar Messages

Suffix Tree Disk Based

I am doing a senior project with the topic of Genomic Code Searching. I've already done the java code for building suffix tree and also can be used for searching from the created tree with memory based version. Now I have a problem with the disk base version since my advisor want me to do suffix tree disk base version to handle a large size of human genome. He suggested me to read papers such as http://www.eecs.umich.edu/~jignesh/publ/stc.pdf and try to implement one of them. Anyone have an experience in doing this kind of program. I can understand the algorithm in some paper but anyway I can't implement it. Please give me the code or give me some suggestion about this. The advisor need me to compare the performance of disk based version with my program in memory based version. It would be grateful if anyone can give me the code.
Thank you so much

Sorry about this but i'm not asking others to do my project. I've already done the programming in memory based version but for the disk based one my advisor told me that you can download the code from internet and try to use it to compare the performance with my program. Since all papers about disk base suffix tree are the Phd thesis, I sincerely say that I don't have enough ability to implement as the published paper. I just need some code and compare the performance of that program with my own program to see the difference between memory base and disk base.
Or anyone can give me some suggestion? Sorry for making some misunderstanding. I just need some help to make the experimental part about performance of my program is completed. Thanks again

Suffix Tree and similar

Hi,
I'm stuck with the app i'm writing and i really need some good ideas.
The basic problem is:
I generate a list of string, everytime i produce a new string i need to check if it is already present in the list, if not i add it to the list.
Each string is ordered and the list is ordered only respect to the first char.
To cope with that i used Tree structure similar to a Suffix Tree where i insert the string into the tree and it easily checks if the string is already present or not.
The tree works well but with strings of length 7 or more the heap goes out of bound.
Maybe i need a better implementation of the suffix tree,
do you know any?
or some other idea that requires lower space?
Thanks in advance!

Use a Hashset and not a list. It's exactly for that purpose.
You can later still fill that Set into a List and use a Comparator for custom sorting.

Suffix Tree Library

Is there any library for creation of suffix tree and which also has a method to search a substring in the suffix trees ??

Now posted at [java-forums.org|http://www.java-forums.org/advanced-java/16734-suffix-tree-library.html].

Good opensource library for suffix trees, longest common subsequence ..?

What opensource / free (preferably GPL/LGPL licensed) libraries for suffix trees, longest common subsequence and longest common contiguous subsequence do exist for Java?
Any practical experiences?

What opensource / free (preferably GPL/LGPL licensed) libraries for suffix trees, longest common subsequence and longest common contiguous subsequence do exist for Java?
Any practical experiences?

Simple Tree searching algorithm

Hi folks,
I just need someone to give me a little hint about the A* (a star) algorithm to be applied on n-nary tree. I guess some of u reading this may have good algorithms concept and can help me. I am stuck for 3 nights :(
Thanks a lot in advance for caring to read.

Hi folks,
I just need someone to give me a little hint about
the A* (a star) algorithm to be applied on n-nary
tree. I guess some of u reading this may have good
algorithms concept and can help me. I am stuck for 3
nights :(
Thanks a lot in advance for caring to read.No one knows how your tree (API) looks like. So no one knows how your A* algorithm can make a guess as to where to go to find the finish-node. Post what you have been trying here.
When posting code, please use code tags: http://forum.java.sun.com/help.jspa?sec=formatting

N-Tree rendering algorithm.

Hello,
I'm trying to create a program that can arrange a number of people's contact information into an n-Tree structure.
I'm using a class called JBigTree that extends JPanel and overriding paint() to paint the entire tree with basically a rectangle for each person and their name in the middle.
I'm going half-crazy at the moment trying to get a rendering algorithm that does not eventually overlap two or more nodes on top of each other...
I'm going around looking for some sort of algorithm that's already been done to render an n-Tree of variable size to screen but so far I haven't found one.
To give you an idea, it looks a bit like this:
class Person
String name;
ArrayList<Person> children;
Person parent;
/* Add obvious get/set/add methods */
class JBigTree extends JPanel
private static final int CELL_WIDTH=100;
private static final int CELL_HEIGHT=40;
private static final int H_GAP=30;
private static final int V_GAP=36;
/*skip some of the initialization code...*/
public void paint(Graphics g)
renderNode(g,root,getWidth()/2,V_GAP);
private void renderNode(Graphics g, Person node, topX,topY)
//HELP!!!!
}

In case anyone googles this, I've gotten an algorithm working after a couple days of letting the problem simmer in my head and a couple hours of hardcore coding...
It also allows to select single nodes by clicking...
You can figure out the Node's class code on your own, it simply uses an ArrayList to store kids and stores its parent node, no other restrictions.
Here is it is:
import java.awt.Color;
import java.awt.Dimension;
import java.awt.Graphics;
import java.awt.event.MouseEvent;
import java.awt.event.MouseListener;
import javax.swing.JPanel;
import com.kangen.JMain;
import com.kangen.helpers.JDistributeur; //This is the actual Node class, in my case it represents also a reseller ("Distributeur" in French), hence the name.
public class JBigTree extends JPanel implements MouseListener {
     private static final int CELL_WIDTH=100;
     private static final int CELL_HEIGHT=40;
     private static int H_GAP=16;
     private static final int STATIC_V_GAP=36;
     private static int V_GAP=STATIC_V_GAP;
     private static final Color BG_COLOR=Color.white;
     private static final Color NODE_FRAME_COLOR=Color.BLUE;
     private static final Color NODE_FILL_COLOR=Color.cyan;
     private static final Color LINK_COLOR = Color.red;
     private JDistributeur nodeSelected=null;
     public JBigTree()
          addMouseListener(this);
     public void paint(Graphics g)
          g.setColor(BG_COLOR);
          g.fillRect(0,0, getWidth(),getHeight());
          int displayWidth=((treeWidth(JMain.getRootNode()))*(CELL_WIDTH+H_GAP))*2;
          setSize(displayWidth,getHeight());
          setPreferredSize(new Dimension(displayWidth,getHeight()));
          setMinimumSize(new Dimension(displayWidth,getHeight()));
          int displayHeight,treeHeight=treeHeight(JMain.getRootNode());
          V_GAP= STATIC_V_GAP + (STATIC_V_GAP * (treeHeight/8));
          displayHeight=treeHeight(JMain.getRootNode())*(CELL_HEIGHT+V_GAP);
          setSize(getWidth(),displayHeight);
          setPreferredSize(new Dimension(getWidth(),displayHeight));
          setMinimumSize(new Dimension(getWidth(),displayHeight));
          renderNode(g,JMain.getRootNode(), (getWidth()/2)-(CELL_WIDTH/2) ,(V_GAP) );
     private void renderNode(Graphics g,JDistributeur node, int topX, int topY)
          if (node==null||g==null)
               return;
          node.setTopX(topX);
          node.setTopY(topY);
          g.setColor(NODE_FRAME_COLOR);
          g.drawRect(topX, topY, CELL_WIDTH, CELL_HEIGHT);
          if (node==getNodeSelected())
               g.setColor(NODE_FILL_COLOR);
               g.fill3DRect(topX, topY, CELL_WIDTH, CELL_HEIGHT,true);
          g.setColor(NODE_FRAME_COLOR);
          g.drawString(node.getNom(),topX+5,topY+(CELL_HEIGHT/2));
          if (node.getChildrenNodes().size()==0)
               return;
          int totalChildren = treeWidth(node);
          int leftZeroForThisNode = topX-(int)((double)(CELL_WIDTH+H_GAP)*(((double)totalChildren/2.0)))+(CELL_WIDTH/2);
          if(leftZeroForThisNode<0)
               leftZeroForThisNode=0;
               JMain.ERROR("leftZeroForThisNode < 0");
          int childrenDrawn = 0;
          for (int i=0;i<node.getChildrenNodes().size();i++)
               int thisNodeChildren = treeWidth(node.getChild(i));
               childrenDrawn+=thisNodeChildren;
               int newTopX=leftZeroForThisNode+((childrenDrawn*(CELL_WIDTH+H_GAP))/2);
               newTopX-=(CELL_WIDTH/2);
               renderNode(g,node.getChild(i),newTopX,topY+V_GAP+CELL_HEIGHT);
               g.setColor(LINK_COLOR);
               g.drawLine(topX+(CELL_WIDTH/2),topY+CELL_HEIGHT,newTopX+(CELL_WIDTH/2),topY+V_GAP+CELL_HEIGHT);
               leftZeroForThisNode+=thisNodeChildren*(CELL_WIDTH+H_GAP);
     private int treeWidth(JDistributeur node)
          if (node==null)
               return 0;
          int nb=0;
          for (int i=0;i<node.getChildrenNodes().size();i++)
               nb+=treeWidth(node.getChild(i));
          return 1 + (nb>0?nb-1:0);
     private int treeHeight(JDistributeur node)
          if (node==null)
               return 0;
          int maxChild=0;
          for (int i=0;i<node.getChildrenNodes().size();i++)
               int childHeight = treeHeight(node.getChild(i));
               if (childHeight>maxChild)
                    maxChild=childHeight;
          return 1+maxChild;
     @Override
     public void mouseClicked(MouseEvent arg0) {
          // TODO Auto-generated method stub
     @Override
     public void mouseEntered(MouseEvent arg0) {
          // TODO Auto-generated method stub
     @Override
     public void mouseExited(MouseEvent arg0) {
          // TODO Auto-generated method stub
     @Override
     public void mousePressed(MouseEvent arg0) {
          JDistributeur nodeClicked = nodeClicked(JMain.getRootNode(),arg0.getX(),arg0.getY());
          if (nodeClicked!=null)
               if (getNodeSelected()==nodeClicked)
                    setNodeSelected(null);
               else
                    setNodeSelected(nodeClicked);
               JMain.mainWindow.sidePanel.showNode(getNodeSelected());
               repaint();
     private JDistributeur nodeClicked( JDistributeur node, int x, int y) {
          if( (x > node.getTopX()) && (x<node.getTopX()+CELL_WIDTH) && (y>node.getTopY()) && (y<node.getTopY()+CELL_HEIGHT))
               return node;
          else
               for(int i=0;i<node.getChildrenNodes().size();i++)
                    JDistributeur temp = nodeClicked(node.getChild(i),x,y);
                    if(temp!=null)
                         return temp;
               return null;
     @Override
     public void mouseReleased(MouseEvent arg0) {
          // TODO Auto-generated method stub
     public void setNodeSelected(JDistributeur nodeSelected) {
          this.nodeSelected = nodeSelected;
     public JDistributeur getNodeSelected() {
          return nodeSelected;
}

Tree creation algorithm needed. Experts pls help

hi,
I'm writing a program that read data from database to create a Tree.
The database table:
DATA          PREDATA          POSTDATA
1 4 5
2 6 7
3 1 2
4 8 9
If the DATA has a PREDATA, it will be inserted into the left node otherwise null and POSTDATA will be inserted into right node. For example if I pass in 3 the Tree should look like the following:
3
1 2
4 5 6 7
8 9 null null null
Thanks

This is a fairly straighforward problem but I see two possible problems.
1) There could be more than one root. Any row with identifier that is never used as pre or post is a root.
2) If a row identifier is used more than once as pre or post then one no longer has a simple binary tree.
Dealing with 1) is easy - just have a set of roots. Dealing with 2) is not so easy.
My solution follows. It deals with 1 but not 2.
import java.util.*;
public class Test20040622
    static String[][] testData =
        {"1","4","5"},
        {"2","6","7"},
        {"3","1","2"},
        {"4","8","9"},
    static class TreeElement
        TreeElement(String value, Object pre, Object post)
            this.value = value;
            this.pre = pre;
            this.post = post;
        public String toString()
            return value;
        String value;
        Object pre;
        Object post;
    private Set treeRoots;
    public Test20040622()
        HashMap map = new HashMap();
        // Build a map pointing from id to TreeElement
        for (int rowIndex = 0; rowIndex < testData.length; rowIndex++)
            String[] row = testData[rowIndex];
            map.put(row[0], new TreeElement(row[0], row[1], row[2]));
        // The set of keys to the TreeElements
        treeRoots = new HashSet(map.keySet());
        // Go through each of the elements replacing the
        // pre and post String with the corresponding TreeElement
        for (Iterator it = map.values().iterator(); it.hasNext();)
            TreeElement treeElement = (TreeElement)it.next();
            if (treeElement.pre instanceof String)
                treeRoots.remove(treeElement.pre);
                Object pre = map.get(treeElement.pre);
                if (pre != null)
                    treeElement.pre = pre;
            if (treeElement.post instanceof String)
                treeRoots.remove(treeElement.post);
                Object post = map.get(treeElement.post);
                if (post != null)
                    treeElement.post = post;
        // Convert the tree root Strings to TreeElements
        HashSet roots = new HashSet();
        for (Iterator it = treeRoots.iterator(); it.hasNext();)
            roots.add(map.get(it.next()));
        treeRoots = roots;
    public void print()
        for (Iterator it = treeRoots.iterator(); it.hasNext();)
            printOne(it.next(), 0);
    private void printOne(Object leaf, int level)
        if (leaf instanceof TreeElement)
            printOne( ((TreeElement)leaf).pre, level+1);
        for (int i = 0; i < level; i++)
            System.out.print("   ");
        System.out.println(leaf);
        if (leaf instanceof TreeElement)
            printOne(((TreeElement)leaf).post, level+1);
    public static void main(String[] args)
        new Test20040622().print();
}

What is the clustering algorithm in IMAQ AutoMThreshold VI?

Hello!
I am looking at the IMAQ AutoMThreshold (as well as the AutoBThreshold) and it does not tell us which algorithms are being used to set
the thresholds. Does anyone know the underlying details?
There are about 40 different techniques described in http://www.busim.ee.boun.edu.tr/~sankur/SankurFolder/Threshold_survey.pdf
which have various performance in different cases.
Thanks
D.

Before you can answer that question, you have to answer the question "what is the zip format"? It is a trick quesetion. There has never been such a thing. Zip was invented by the proverbial "some dude" (Phil Katz) who was a good at coding. The Zip format was never standardized so there is really no such thing as a zip file. There are many zip variants that use different forms of encryption.
The original zip encryption was a homemade algorithm written by Roger Shalfly. He has a PhD in math but this was an early attempt before the field was mature and people starting giving names to individual algorithms. Here is some information about the original zip encryption: http://cs.sjsu.edu/~stamp/crypto/PowerPoint_PDF/8_PKZIP.pdf
And here is a paper about one particular Zip variant: http://eprint.iacr.org/2004/078.pdf

Need help with tree edit distance and restricted top-down mapping algorithm

*This topic was posted a while ago in "java programming" section but was suggested to try here
Hi everyone,
A couple of days ago I posted a topic on analyzing structure similarity between two web pages. After some researching, I know I need to work out some tree matching algorithms: tree edit distance algorithm(TED) and a improved version: restricted top-down mapping algorithm(RTDM). TED is about calculating the minimum operation cost(insert, delete, replace) to map one tree into another. RTDM further restricts the 3 operations to only the leaf nodes so as to improve time complexity.
This is the general idea but I'm having difficulties to find resources to let me understand and implement the algorithms. I'm using ACM portal (Association for Computing Machinery) to access the technical papers but I find that they do not provide enough info, google gives mostly the same technical papers and some websites which illustrate the general idea of these algorithms.
Hoping that you can give me some guidance on these 2 algorithms. Not looking for codes but I need more details on them. Thanks in advance.

For scientific research I prefer Scirus: http://www.scirus.com/
Just two pages I found on a quick search:
http://arxiv.org/abs/cs/0604037
http://www.cs.uic.edu/~yzhai/
The latter might not be exactly what you asked for, but you might be interested in the listed publications. I have not taken a closer look.

String algorithms

Does anyone remember the suffix tree algorithm or does know a tutorial or a book about string matching/patterns ?
Thanks!

http://lmgtfy.com/?q=suffix+tree+algorithm
http://www.lmgtfy.com/?q=string+matching+patterns

I'm just wondering : when a Component Tree is contitute for the first time

When I'm looking at the JSF life cycle figure I can see the "Reconstitute Componente Tree" but I don't see something like "Constitute Componente Tree". When is it supposed to happen ?
Thanks for your answer.

If there is no existing tree (for example if it's the first time you reach a certain page), you start with an empty tree. It is up to you to create a new tree, either directly from Java, or (which is probably going to be the most typical scenario) through a template language like JSP using the Standard HTML RenderKit Tag Library.
When using JSP, the JSF spec defines a sort of tree merging algorithm, which merges an existing tree with a tree defined in your JSP file. If the initial tree is empty, the resulting tree is going to depend 100% on the tree defined in your JSP. If there is an initial tree, slightly more complex rules may apply.

Clustering in java

I know this is not a specific problem related to java, but I am looking to implement in java some sort of clustering algorithm to cluster points on a map. Was just wondering if anyone had any experience in this, or knew of any good related wen pages.
Thanks for your help

For clustering low dimensional data (like 2-d map coordinates) your best bets are "k-means clustering" and "agglomerative hierarchical clustering". Both are fairly fast, not difficult to implement and produce reasonable results. Google for details, I imagine there are probably java implementatins out there somehwere.

Oracle text clustering - use of Stemming

Hi,
I am using K-means to cluster a set of documents. But I am unable to set the clustering algorithm parameters to use stemming for the tokens. The clustering algorithm uses 'move', 'moving', 'moves', 'moved' as separate words and clusters the documents into different clusters. I would like to group all the documents that contain 'move', 'moving', 'moves', 'moved' into a single group by using the stem 'move'. I am unable to do this so far. In case any of you have some ideas, please suggest.
I use the following preferences and attributes to create a text index:
BEGIN
CTX_DDL.DROP_PREFERENCE ('test_lex');
CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'ENGLISH');
END;
drop index temp_idx;
CREATE index temp_idx ON temp(text1) indextype is CTXSYS.CONTEXT parameters ('WORDLIST CTXSYS.BASIC_WORDLIST LEXER test_lex SYNC (ON COMMIT)');
And below is the code I use to cluster the documents:
create table temp0 (docid NUMBER, clusterid NUMBER, score NUMBER);
create table temp1 (clusterid NUMBER, descript varchar2(4000), label varchar2(200), sze number, quality_score number, parent number);
begin
ctx_ddl.drop_preference('my_cluster');
ctx_ddl.create_preference('my_cluster','KMEAN_CLUSTERING');
ctx_ddl.set_attribute('my_cluster','CLUSTER_NUM','10');
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
ctx_output.start_log('my_log');
ctx_cls.clustering('temp_idx','seq','temp0','temp1','my_cluster');
ctx_output.end_log;
end;
Thanks!

Make the following true
ctx_ddl.set_attribute('my_cluster','STEM_ON','FALSE');
i.e.
ctx_ddl.set_attribute('my_cluster','STEM_ON','TRUE');
and then create the clusters. Also as you have already done, the lexer should have INDEX_STEM on

K mean clustering

I want to implement the k mean clustering algorithm in oracle. is it possible t do so,
i want to make the group of similar elements of single attribute using k-mean clustered.
if yes then please help me in this regards.
Edited by: user13068721 on May 3, 2010 12:54 AM

http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cdatadic.htm#CCREF2039
http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/classify.htm#CCAPP9228

Suffix tree clustering algorithm?

Similar Messages

Maybe you are looking for