A slightly unusual String question

Hi everybody,
Does anyone know if there is a limit to how long Strings can be? I'm parsing an html file and the opening and closing paragraph tags are not always on the same line. The only way I could think of to make sure I parsed out the paragraphs correctly was to squish the whole HTML body (which is very long) into one big String and scan through it for the tags. Hence my question; I don't want to kill my JVM with an OutOfMemoryError.
Any help on this would be great.
Thanks,
Jezzica85

extract from word docs you say?
ive done that
package com.doesthatevencompile.desktopsearch.filetypes.framework.msdoc;
import org.apache.lucene.document.Document;
import org.textmining.text.extraction.WordExtractor;
import com.doesthatevencompile.desktopsearch.filetypes.framework.DocumentFieldHelper;
import com.doesthatevencompile.desktopsearch.filetypes.framework.DocumentHandler;
import com.doesthatevencompile.desktopsearch.filetypes.framework.DocumentHandlerException;
import java.io.InputStream;
public class TextMiningWordDocHandler implements DocumentHandler {
  public Document getDocument(InputStream is)
    throws DocumentHandlerException {
    String bodyText = null;
    try {
      bodyText = new WordExtractor().extractText(is);
    catch (Exception e) {
      throw new DocumentHandlerException(
        "Cannot extract text from a Word document", e);
    if ((bodyText != null) && (bodyText.trim().length() > 0)) {
      Document doc = new Document();
      DocumentFieldHelper.addFieldToDocument(doc, DocumentFieldHelper.KEYWORD_ALL_TEXT,  bodyText);
      DocumentFieldHelper.setDocumentType(doc, DocumentFieldHelper.TYPE_DOC);
      return doc;
    return null;
}the lib you need can be found here:
http://doesthatevencompile.com/current-projects/code-sniplets/lib/
called tm-extractors
there is some lucene code mixed in the code sniplet which you dont need to worry about. hopefully this is enough to set you on your way

Similar Messages

  • Unusual string insertion question

    Hi everybody,
    I'm trying an experiment with "encoding" a text file, and I was wondering, is there a specific way to insert a random character, say, "&", into random places in a text string a random number of times? I've heard of Math.random, but I don't think that would do what I'm looking for, can anyone help me out?
    Thanks,
    Jezzica85

    Cipher.java
    import java.util.*;
    public abstract class Cipher {
      public String encrypt(String s) {
        StringBuffer result = new StringBuffer("");        
        StringTokenizer words = new StringTokenizer(s);    
        while (words.hasMoreTokens()) {                    
          result.append(encode(words.nextToken()) + " ");  
        return result.toString();                          
      public String decrypt(String s) {
        StringBuffer result = new StringBuffer("");        
        StringTokenizer words = new StringTokenizer(s);    
        while (words.hasMoreTokens()) {                    
          result.append(decode(words.nextToken())+ " ");  
        return result.toString();                          
      public abstract String encode(String word);          
      public abstract String decode(String word);
    } Caesar.java
    public class Caesar extends Cipher {
           public String encode(String word) {
             StringBuffer result = new StringBuffer();   
             for (int k = 0; k < word.length(); k++) {   
               char ch = word.charAt(k);                
               ch = (char)('a' + (ch -'a'+ 3) % 26);     
               result.append(ch);                        
             return result.toString();                   
           public String decode(String word) {
             StringBuffer result = new StringBuffer();   
             for (int k = 0; k < word.length(); k++) {   
             char ch = word.charAt(k);                   
                ch = (char)('a' + (ch - 'a' + 23) % 26); 
                result.append(ch);                       
             return result.toString();                   
         } TestEncrypt.java
    public class TestEncrypt {
           public static void main(String argv[]) {
             Caesar caesar = new Caesar();
             //here's the message
             String plain = "this is the secret message";       
             //encrypt the message
             String secret = caesar.encrypt(plain);               
             System.out.println(" ********* Caesar Cipher Encryption *********");
             System.out.println("PlainText: " + plain);           
             System.out.println("Encrypted: " + secret);
             System.out.println("Decrypted: " + caesar.decrypt(secret));
         } Message was edited by:
    fastmike

  • Two String questions

    Hi!
    I have two questions:
    1) Can I do toString() on a null value?
    2) Is there any difference in writing:
    String s = new String();
    and
    String s;

    isnt it true that in the first case s -object of
    String class is created??Yes, the empty string, length zero, as he said.
    >
    in second case just defining a variable type String??Yes. It will either be null or have an udefined value, depending on whether it's a member variable or a method variable.

  • Convert string question... "\"

    I try the below code on jsp, but result is : a , a\b
    but I want the result is : a\\b, a\\\\b
    Thanks !
    <%
         String str = "a\b, a\\b";
         String outStr = "";
         for (int i=0;i<str.length();i++){
              char c = str.charAt(i);
              if (c == '\\'){
                   outStr += "\\";
              else{
                   outStr += c;
    %>
    <%=outStr%>

    Right, because the compiler interprets "\\" as an escape character, indicating that you want a backslash. (If you tried to just use a String "\", you'd probably get a compilation error, since the compiler would think you were trying to escape the second quotation).
    I think you're going to need to use RegExp... this question has been asked a bunch before, so search through the forums for an answer.

  • Number - string question

    Hello,
    I have the next problem:
    I have cursor that contains the next thing:
    WHERE LKP_INDUSTRY1_ID IN ( p_sector )
    p_sector is being delivered in varchar: '23;22;36'
    My question is: is it possible to manipulate p_sector in a way that I can paste it in my cursor? I can replace the ';' with ',' but I don't know how I can make it read the numbers as numbers and not as strings.
    Thanks in advance.
    Oli

    Does this solve your problem ?
    SQL> create or replace type tab_n is table of number;
      2  /
    Type created.
    SQL> declare
      2   p_sector varchar2(20) := '7369;7499;7521';
      3   t tab_n := tab_n();
      4   element number;
      5   pos integer := 0;
      6   cursor a is select ename from emp where empno in (select * from table(t));
      7  begin
      8   for i in 1..length(translate(p_sector || ';',';0123456789',';')) loop
      9    t.extend;
    10    t(t.count) := substr(p_sector,pos+1,instr(p_sector || ';',';',1,i)-pos-1);
    11    pos := instr(p_sector,';',1,i);
    12   end loop;
    13   for v in a loop
    14    dbms_output.put_line(v.ename);
    15   end loop;
    16  end;
    17  /
    SMITH
    ALLEN
    WARD
    PL/SQL procedure successfully completed.Rgds.

  • Concat substring and string question

    Hello,
          So I am trying to add together a string \\fafs10\home and the concat of a substring of the first initial of a GivenName and LastName. I keep getting a NaN error.
    Here's what I have:
    =string("\\fafs10\home\") + concat(substring(GivenName,0,1), LastName)
    I want to return this as a default value to my list.
    Any help would be greatly appreciated. Thank you.
    Matthew

    Hi
    are you doing this in a calculated column or within a rule on an InfoPath form or another way?
    you should probably just go with
    concat("\\fafs10\home\",substring(GivenName,0,1),LastName)
    Regards
    Sergio Giusti Sergio Blogs
    Linked
    In Profile
    Whenever you see a reply you think is helpful, click Vote As Helpful.
    Whenever you see a reply you think is the answer to the question, click Mark As Answer.

  • A few basic string Questions

    I would like to know how to make a string with the same characters as another string. Also, how can I set an int with the same value as there are characters in a string. It would really help if you gave me an example, because I am new to java and pretty much lost.

    That page has alot of information, but I really don't
    know how to use any of it. It would be really helpful
    is some one gave me an example. I see something like
    "int length ( )", but I don't know how to use it.You don't know how to call a method? Then you need to start from the very beginning:
    Sun's basic Java tutorial
    Sun's New To Java Center. Includes an overview of what Java is, instructions for setting up Java, an intro to programming (that includes links to the above tutorial or to parts of it), quizzes, a list of resources, and info on certification and courses.
    http://javaalmanac.com. A couple dozen code examples that supplement The Java Developers Almanac.
    jGuru. A general Java resource site. Includes FAQs, forums, courses, more.
    JavaRanch. To quote the tagline on their homepage: "a friendly place for Java greenhorns." FAQs, forums (moderated, I believe), sample code, all kinds of goodies for newbies. From what I've heard, they live up to the "friendly" claim.
    Bruce Eckel's Thinking in Java (Available online.)
    Joshua Bloch's Effective Java
    Bert Bates and Kathy Sierra's Head First Java.
    James Gosling's The Java Programming Language. Gosling is
    the creator of Java. It doesn't get much more authoratative than this.
    Here's a freebie though:String str = ...;
    int len = str.length();

  • String question

    is there a way to remove a character from a string, for example:
    "< head>"
    and do some sort of trim to it to create
    "<head>"
    thanks

    halfpipehippie wrote:
    is there a way to remove a character from a string, for example:
    "< head>"
    and do some sort of trim to it to create
    "<head>"
    thanksreplaceAll(...) can handle this quite easily:
    String text = "abc <head > def </ head > ghi";
    System.out.println(text.replaceAll("\\s++(?=[^<>]*+>)", ""));But when your tags contain attributes, you can't use the code above. But, you would have mentioned such an important piece of information in your original post, right?

  • String question regarding "\"

    i am trying to read in a CSV (coma separated value) file containing text like this
    "2007/10/04","22:47:24","C:\test\tp2c266b.BAT","deleted","",""
    "2007/10/04","22:48:06","C:\Program Files\Common Files\Symantec Shared\CCPD-LC\symlcrst.dll","changed","",""
    "2007/10/04","22:48:19","C:\PROGRA~1\Symantec\LIVEUP~1\ludirloc.dat","changed","",""
    Using a CSV parser from:
    http://ostermiller.org/utils/CSV.html
    This code is easy to use and looks just like the example shown.... However when i parse out the array and print it to the screen the output for the path name looks like this:
    C:testtp2c266b.BAT
    C:Program FilesCommon FilesSymantec SharedCCPD-LCsymlcrst.dll ---- no \ is there
    i realize that the \ is an escape character but i need it, without it the directory listing is pointless.... please help have no idea what to do
    How do i read in the \ from the file and keep the dang thing..... or replace it with a / anything is better then no indicator of directory structure

    CODE:
    import com.Ostermiller.util.CSVParser;
    import java.io.InputStreamReader;
    import java.io.FileInputStream;
    import java.io.*;
    import hansen.filespy;
    import java.net.*;
    public class EventReader {
    public static void main(String[] args) throws Exception
         java.net.InetAddress i = java.net.InetAddress.getLocalHost();
         getMachineID ID = new getMachineID();
    int machineid = ID.getIT(i.getHostName());
    while(true)
    runit(machineid);
    try{Thread.sleep(10000);}catch(Exception e){}//sleep for 10sec
    public static void runit(int machineid) throws Exception{
    filespy spy = new filespy();
    File f = new File("C:\\WINDOWS\\system32\\scl.csv");
    //test code ---- usb key??
    File[] roots = File.listRoots();
    for ( File root : roots )
    System.out.println( root );
    //test code ---- end
    if (f.exists()) {
    CSVParser shredder = new CSVParser(new InputStreamReader(new FileInputStream("C:\\WINDOWS\\system32\\scl.csv")));
    String[] t;
    //remove header
    String[] header; //------------------------------create string array to contain stings
    header = shredder.getLine();
    while ((t = shredder.getLine()) != null) {
    System.out.println(t[1]); //date //-----------------------contains date
    System.out.println(t[2]); //file //------------------------contains filepath ------ however it shows C:testprogram files instead of C:\test\program files
    System.out.println(t[3]); //event //-----------contains event
    System.out.println(t[4]); //empty line
    spy.spy(t[1], t[2], t[3], "Non-System", machineid); //--------------sending to outside class for further parsing.... see code example from above for what im doing
    shredder.close();
    boolean f1 = new File( "C:\\WINDOWS\\system32\\scl.csv").delete();
    if (!f1) {
    System.out.println("failed to delete, file not there");
    }//end if
    else{System.out.println("no events");}
    }

  • Quick string question finding if a string contains a character

    hello peeps
    is there a quick way of checking if a string contains a specific character
    e.g.
    myString="test:test";
    myString.contains(":");
    would be true
    any ideas on a quick way of doing it

    is there a contains() method in 1.4.2? i couldnt see
    it in the docsNo there isn't. But the 1.5 has a contains(CharSequence s) method.

  • Sorting String Question

    Hi guys, I'm still learning the intricacy of Java at the moment so bear with me, as I'm sure the answer is going to be simple has heck ...
    I'm trying to sort a String, where I can make the letters in order. Basically, if the String is "ddeab" then it should be converted to "abdde".
    So far, I'm using an int variable to get the value of the char in first and second letter ( via charAt to pick the position ) and swap them if they're are lower. However, this is where I'm stuck - how do I get them to "swap" in a string?
    Edited by: Phoom on Dec 1, 2007 11:58 AM

    Phoom wrote:
    After a quick reading on Strings, I found toCharArray() useful in this case. I guess I can convert it back to string format by using a for loop and make it add to a string.
    ...No need to loop over that array: have a look at the various constructors of the String class.
    And unless it a (homework) requirement to implement your own sorting algorithm, have a look at java.util.Arrays' sort methods.
    Good luck.

  • Quick String Question

    A co-worker had a query he was running with a where clause something like this...
    Where
    Name='BLAH & BLAHBLAH'
    How do I pervent sql developer from treating the & symbole as a request for a variable prompt?
    It's would prompt for variable named BLAHBLAH...
    Thanks
    Obe
    ps. So far the Java SDK 6_10 is working...

    In SQL*Plus you can "SET DEFINE OFF" prior to the SQL statement to get it to ignore the ampersand, but as far as I know (and I may be wrong), you can't do that in SQL Developer Worksheet.
    What I normally do in this situation is break the string up and use "||" and "CHR". To use your example:
    WHERE name = 'BLAH ' || CHR(38) || ' BLAHBLAH'
    Ed. H.

  • Very simple Strings Question

    Hi All:
    I am stuck with a small situation, I would appreciate if someone can help me with that. I have 2 strings:
    String 1 - "abc"
    String 2 - "I want to check if abc is in the middle"
    How can I check if the string 1 "abc" is in the middle of the string 2. I mean the String 2 does not start with "abc" and it does not end with "abc", I want to check if it is somewhere between the string.
    Thanks,
    Kunal

    int i = s2.indexOf(s1);
    if((i > 0) && ((i + s1.length()) < s2.length())) {
       // somewhere in the middle
    } else if(i == 0) {
       // start
    } else if((i + s1.length()) == s2.length()) {
       // end
    } else if(i == -1) {
       // nowhere
    }

  • Urgent String question

    The effect of repeated concatenation of Strings can cause a lot of unwanted objects in memory. If I am creating a query statement as follows:
    String query = "Select * from"+tablename+"where something "+x+"something"+y.......(All in a single statement)
    Willl this also create a lot of unwanted objects before query finally gets its value? Should I do this concatenation with a stringbuffer?

    I found the following listing and hope it helps...
    http://java.sun.com/docs/books/jls/second_edition/html/expressions.doc.html#39990
    15.18.1.2 Optimization of String Concatenation
    An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
    For primitive types, an implementation may also optimize away the creation of a wrapper object by converting directly from a primitive type to a string.
    Hope this also helps.

  • A slightly different backlight question

    I've heard some other people have had a problem with the backlight of their iBooks turning off and then not coming back. I've had the same problem but it tends to happen when I adjust the screen. I'll push it back slightly and it will turn off. If I sleep it, it will sometimes come back but usually I have to shut it down, close the lid, open it up and start up again. Most of the time, it comes back. I've tried zapping the PRAM, and resetting the PMU. Anything else to be done? Or might I have a frayed wire or something between my computer and screen which becomes disabled when I move the screen. It's an old computer and I don't really want to take it in for hardwear repairs if I can avoid it but it's been a major problem. Thanks.

    Hi, and welcome to Apple Discussions.
    You are not alone.
    Dominion Tech in Colchester, Vermont used to do the repair for $79.90. You may want to call and see if the price still stands. You may also want to check your local Yellow Pages to see if someone will meet that price to save the shipping charges.

Maybe you are looking for

  • Disk image for "Roller Coaster Tycoon" disk

    Hi Just bought RCT 3. I want to create a disk image so that I don't have to use the game DVD all the time and carry it everywhere with me. I tried this by inserting the disk, creating an image from the disk and saving it as a CD/DVD Master. The disk

  • Good practice

    Hi all, I have recently introduced APEX as our tool of choice for RAD of application dev accross our shared instance. I running a POC on Apexv4 via APEXLISTNER(TOMCAT) on 10G R2 (Redhat) Works a treat! I want to set up a suite off applications that a

  • PSE13 and Itunes 11.4 synchronization

    I have experienced some problems when I try to synchronize my PSE 13 photo albums from Itunes to my different devices such as Ipad and Iphone. That worked properly with PSE12 but since I upgraded to version 13 of PSE i doesn't work anymore. However i

  • How to know the type of transactions by seeing the data in IP_IN_QUEUE

    Hi B2B gurus, I am Using EDI X12 over internet. Our suppliers processing inbound transactions to us,After receiving from B2B its storing in IP_IN_Queue, The documents which we are sending it storing in IP_OUT_Queue. We want to see the data in in IP_I

  • Get the Encoder Data from NI Robotics Starter Kit 1.0

    I want to make a perfect 90 degree turn using the encoder Data from Robotics Starter Kit 1.0. Now I am doing it giving the angular velocity and controlling the time. But, the problem is, it's not perfect all the time. It is not repeatable. So, I want