Text Content of Document is returned as null

Hi All,
I am trying to use the JTidy parser to parse an input HTML string. But when I am trying to type the content of the Document, it is returning null. I am new to DOM parsing, so is there anything that I am doing wrong? Any pointer will be very helpful.
Here is my code:
public static void main(String[] args) {
          String rawHtml = "<p class=\"MsoNormal\" style=\"text-autospace:none;\"><font color=\"black\"><span style=\"color:black;\">???</span></font><b><font color=\"#7f0055\"><span style=\"color:#7f0055;font-weight:bold;\">private</span></font></b><font color=\"black\"><span style=\"color:black;\"> String parseDescription</span></font><font>";
          Tidy tidy = new Tidy();     //obtain a new Tidy parser instance
          tidy.setPrintBodyOnly(true);
          ByteArrayOutputStream baos = new ByteArrayOutputStream();
          PrintStream ps = new PrintStream(baos);
          byte[] bytes = rawHtml.getBytes();
          ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
          //returns HTML Document
          Document htmlDoc = tidy.parseDOM(bais, ps);
          String docText = htmlDoc.getTextContent();
          try {
               System.out.print(baos.toString("UTF8"));
               System.out.println();
               try {
                    System.out.print(docText);
               } catch (DOMException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
          } catch (UnsupportedEncodingException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
Here is the output:
line 1 column 302 - Warning: missing </font>
line 1 column 302 - Warning: trimming empty <font>
line 1 column 303 - Warning: inserting missing 'title' element
InputStream: Document content looks like HTML 4.01 Transitional
3 warnings, no errors were found!
You are recommended to use CSS to specify the font and
properties such as its size and color. This will reduce
the size of HTML files and make them easier to maintain
compared with using <FONT> elements.
<p class="MsoNormal" style="text-autospace:none;"><font color=
"black"><span style="color:black;">???</span></font><b><font color=
"#7F0055"><span style=
"color:#7f0055;font-weight:bold;">private</span></font></b><font
color="black"><span style="color:black;">String
parseDescription</span></font></p>
null

getTextContent() is a dom level 3 recommendation, it may not be supported by every dom parser - and it is not by JTidy (yet). One way to do it is to write your own method to get the same. But you sure can chain the now well-formed output of JTidy to a full-fledged dom parser, say, such as oracle.xml.parser.v2.DomParser. The detail can be done like this.
[0] Add oracle xdk's xmlparserv2.jar to the classpath.
[1] Add the import to the program for convenience.
import import oracle.xml.parser.v2.*;[2] Then the chaining to get the getTextContent() available.
DOMParser parser=new DOMParser();
try {
    parser.parse(new ByteArrayInputStream(baos.toByteArray()));
    Document xdoc=parser.getDocument();
    System.out.println(xdoc.getDocumentElement().getTextContent());
} catch (Exception e) {
    e.printStackTrace();
}[3] With xdoc, you can do more things to your liking as it is now a full-fledged dom tree.

Similar Messages

  • Open xml relationship target is NULL when inserting chart into a mapped rich text content control in Word 2013

    Hi,
    I have a word document with a rich text content control that is mapped to a CustomXml. You can download an example here
    http://1drv.ms/1raxoUr
    I have looked into the specification ISO/IEC 29500-1:2012 and i understand that the attribute Target for the element Relationship can be set to NULL at times(Empty header and footer in the specification).
    Now, i have stumbled on Target being NULL also when inserting a diagram into a word document. For example:
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject"
    Target="NULL" TargetMode="External" xmlns="http://schemas.openxmlformats.org/package/2006/relationships" />
    Why is Target="NULL" and how should i interpret that Target is null?
    Br,
    /Peter
    Peter

    Hello Peter,
    The relationship in question is associated with the externalData element (ISO/IEC 29500-1:2012 §21.2.2.63). For the other two charts in this document, the corresponding relationships are of the other allowable form:
      <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/package" Target="../embeddings/Microsoft_Excel_Worksheet1.xlsx"/>  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/package" Target="../embeddings/Microsoft_Excel_Worksheet2.xlsx"/>
    For charts 1 and 3 in your document, the data can be edited via the Chart Tools ribbon control. The option to edit data is not available for chart 2. The data used to create chart 2 is the same default spreadsheet data used for chart 1, and in fact the spreadsheet
    references are still present in the file format, despite there being no apparent link to a spreadsheet for chart 2.
    Thus, it appears that Target="NULL" in this context means that the chart is not associated with an external data source. The specification doesn't have much to say about the semantics of the Target attribute (ISO/IEC 29500-2:2012 §9.3.2.2) beyond
    the fact that it be a valid xsd:anyURI, which the string "NULL" is.
    It looks like there is some unexpected interaction between the chart and the content control. I don't think the file format is the issue. You will probably need to pursue that behavior from the product perspective via a support incident, if that behavior
    is unexpected. If you still have questions about what is seen in the file format, please let me know.
    Best regards,
    Matt Weber | Microsoft Open Specifications Team

  • I am losing my carriage returns when sending emails. How to keep? uses SMTP Email MIME Text Content-Type.vi

    I am losing my carriage returns when sending emails using the Internet email vi's.
    All carriage returns are stripped out and I get one long word wrapped paragraph.
    I want to avoid html.
    Ideally, using the vi's for rich text would be perfect, but a simple text message with carriage returns and line feeds in any font ok. 
    uses SMTP Email MIME Text Content-Type.vi
    i have tried text/plain, text/html, and mixed yada something

    You need to use Line feed constant and then use concatenate function.See the screen shot.
    Naqqash
    Attachments:
    Using Line feed.png ‏15 KB

  • In adobe reader app on iPad, I have a PDF document that added notes and comments to.  Once I left the document and returned to it, the notes and comments were gone.  Where are they?  I clicked "save" and "done" buttons after I entered text.

    In adobe reader app on iPad, I have a PDF document that added notes and comments to.  Once I left the document and returned to it, the notes and comments were gone.  Where are they?  I clicked "save" and "done" buttons after I entered text.

    The application auto-saves your input when you close the document.  If you left the document, as you state, the notes/comments should have been saved and should have been visible the next time you opened the document with the Mobile Reader (note that if you are opening the document with another app such as Apple's built in PDF Viewer, things like notes/comments may not be visible).  Also note that if you are doing an Open In... from another app (like Dropbox), the version of the document in Dropbox does not update; only the version of the document in the Mobile Reader is updated.
    Would it be possible to send a video of the problem you are encountering to [email protected] so that we can try to help?

  • Table of contents creation deletes text of entire document

    Every time I try to create a TOC for the document I'm working on, InDesign either tells me the TOC has been updated — in which case the entire text of the document has just disappeared — or else it loads the cursor and gives me nothing but the title "Contents". Am I doing something obviously wrong?
    (I've already done the IDML round-trip and the delete preferences.)
    Thanks! — Jeremy

    Thanks, Uwe.
    When I initially created the TOC, I put it into a separate story, and the TOC was empty — but the main text (which is in another single story) remained. Then when I updated the TOC, the main story was replaced with the word "Conents" (same as the TOC).
    So I have no clue what's going on either! It must be some sort of file corruption not repaired by the IDML round-trip. Your own bafflement suggested that file corruption was the problem rather than some silly mistake I was making.
    However, great news: I made a new document, pasted the main story into it, and now everything seems to be working in this new document.
    Many thanks again for your help, especially since I foolishly posted my question in the wrong forum.

  • Find the Missing font text content in indesign document

    How to find and get the missing font content in the indesign documents by using the SDK. give a tips and techniques.
    Anybodys help me.

    It wont answer your question, but you could have this problem too:
    I have a problem in my current project where text frame having no text make the document report the usage of a font wich we dont want the project to use (times by exemple). The problem being that Indesign report usage of a font wich is in fact not applied to any text. Weird!
    The easy fix is to identify those items (text frame class + no text) and reassign them as "unassigned".

  • Can i search a document for text in column a to return the entire row if column a matches in a separate sheet?

    Can i search multiple sheets in a document for text in column a to return the entire row if column a matches in a
    separate sheet?

    Thank you, Barry. That was helpful, and am hopeful that what I want to do is possible.
    I am creating a spreadsheet that currently has 20 sheets, of which certain sheets have more than one table.  I will be adding more sheets.  I would like to return results for all occurences of the search string.  Preferably into a seperate spreadsheet.  It would be perfect if that seperate sheet updated whenever I update information in the first spreadsheet.  To give an example:
    Sheet:     baskets
         Tables:         straw
                             wire
    Sheet:      barware
         Tables:          glasses
                              decanters
                              coasters
    My tables all have the same titiels:
    Vendor      Description     Cost     Selling Price
    Since I will have upwards of 100 sheets, with multiple tables, and most of my vendors will fit into multiple sheet categories, it would be helpful if I could also see what my order will be from each vendor, not just who I will be ordering each item from.  How would I do that? 
    I hope I conveyed that properly. 
    Thank you in advance for your help,
    Rana

  • Images present in datagridview not exporting to file only text contents are generating into PDF file..

    Hi Everyone,
       I have created simple Desktop app in that I trying to generate PDF file from Datagridview...when I click on ExportPDf button Pdf file is generation successfully but the issue is in that pdf whatever the images has present in datagridview that images
    are not generation into PDF only the text contents are Present in PDF file.
      Does any one can tell me how to generate the PDF file along with images.
    Here is my code:
      private void btnexportPDF_Click(object sender, EventArgs e)
                int ApplicationNameSize = 15;
                int datesize = 12;
                Document document = null;
                try
                    SaveFileDialog savefiledg = new SaveFileDialog();
                    savefiledg.Filter = "All Files | *.* ";
                    if (savefiledg.ShowDialog() == DialogResult.OK)
                        string path = savefiledg.FileName;
                        document = new Document(PageSize.A4, 3, 3, 10, 5);
                        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(path + ".pdf", FileMode.Create));
                        document.Open();
                        // Creates a phrase to hold the application name at the left hand side of the header.
                        Phrase phApplicationName = new Phrase("Sri Lakshmi Finance,Hosur-560068", FontFactory.GetFont("Arial", ApplicationNameSize, iTextSharp.text.Font.NORMAL));
                        // Creates a phrase to show the current date at the right hand side of the header.
                        Phrase phDate = new Phrase(DateTime.Now.ToLongDateString(), FontFactory.GetFont("Arial", datesize, iTextSharp.text.Font.NORMAL));
                        document.Add(phApplicationName);
                        document.Add(phDate);
                        iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance("D:\\logo.JPG");
                        document.Add(img);
                        iTextSharp.text.Font font5= iTextSharp.text.FontFactory.GetFont(FontFactory.TIMES_ROMAN, 5);
                        iTextSharp.text.Font font6 = iTextSharp.text.FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 6);
                        //float[] columnDefinitionSize = { 2.5f, 7.0f,6.6f, 8.6f, 6.6f, 5.0f, 4.5f, 7.0f, 6.3f, 7.0f, 3.5f, 6.0f, };
                        PdfPTable table = null;
                        table = new PdfPTable(dataGridView1.Columns.Count);
                        table.WidthPercentage = 100;
                        PdfPCell cell = null;
                        foreach (DataGridViewColumn c in dataGridView1.Columns)
                            cell = new PdfPCell(new Phrase(new Chunk(c.HeaderText,font6)));
                            cell.HorizontalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.VerticalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.BackgroundColor = new iTextSharp.text.BaseColor(240, 240, 240);
                            table.AddCell(cell);
                        if (dataGridView1.Rows.Count > 0)
                            for (int i = 0; i < dataGridView1.Rows.Count; i++)
                                PdfPCell[] objcell = new PdfPCell[dataGridView1.Columns.Count];
                                for (int j = 0; j < dataGridView1.Columns.Count - 0; j++)
                                    cell = new PdfPCell(new Phrase(dataGridView1.Rows[i].Cells[j].Value.ToString(), font5));
                                    cell.HorizontalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.VerticalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.Padding = PdfPCell.ALIGN_LEFT;
                                    objcell[j] = cell;
                                PdfPRow newrow = new PdfPRow(objcell);
                                table.Rows.Add(newrow);
                        document.Add(table);
                        MessageBox.Show("PDF Generated Successfully");
                        document.Close();
                    else
                        //Error 
                catch (FileLoadException fle)
                    MessageBox.Show(fle.Message);
                    MessageBox.Show("Error in PDF Generation", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    Runtime Gridview content:
    Generated PDF File:
    Thanks & Regards RAJENDRAN M

    Hi Everyone,
       I have created simple Desktop app in that I trying to generate PDF file from Datagridview...when I click on ExportPDf button Pdf file is generation successfully but the issue is in that pdf whatever the images has present in datagridview that images
    are not generation into PDF only the text contents are Present in PDF file.
      Does any one can tell me how to generate the PDF file along with images.
    Here is my code:
      private void btnexportPDF_Click(object sender, EventArgs e)
                int ApplicationNameSize = 15;
                int datesize = 12;
                Document document = null;
                try
                    SaveFileDialog savefiledg = new SaveFileDialog();
                    savefiledg.Filter = "All Files | *.* ";
                    if (savefiledg.ShowDialog() == DialogResult.OK)
                        string path = savefiledg.FileName;
                        document = new Document(PageSize.A4, 3, 3, 10, 5);
                        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(path + ".pdf", FileMode.Create));
                        document.Open();
                        // Creates a phrase to hold the application name at the left hand side of the header.
                        Phrase phApplicationName = new Phrase("Sri Lakshmi Finance,Hosur-560068", FontFactory.GetFont("Arial", ApplicationNameSize, iTextSharp.text.Font.NORMAL));
                        // Creates a phrase to show the current date at the right hand side of the header.
                        Phrase phDate = new Phrase(DateTime.Now.ToLongDateString(), FontFactory.GetFont("Arial", datesize, iTextSharp.text.Font.NORMAL));
                        document.Add(phApplicationName);
                        document.Add(phDate);
                        iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance("D:\\logo.JPG");
                        document.Add(img);
                        iTextSharp.text.Font font5= iTextSharp.text.FontFactory.GetFont(FontFactory.TIMES_ROMAN, 5);
                        iTextSharp.text.Font font6 = iTextSharp.text.FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 6);
                        //float[] columnDefinitionSize = { 2.5f, 7.0f,6.6f, 8.6f, 6.6f, 5.0f, 4.5f, 7.0f, 6.3f, 7.0f, 3.5f, 6.0f, };
                        PdfPTable table = null;
                        table = new PdfPTable(dataGridView1.Columns.Count);
                        table.WidthPercentage = 100;
                        PdfPCell cell = null;
                        foreach (DataGridViewColumn c in dataGridView1.Columns)
                            cell = new PdfPCell(new Phrase(new Chunk(c.HeaderText,font6)));
                            cell.HorizontalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.VerticalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.BackgroundColor = new iTextSharp.text.BaseColor(240, 240, 240);
                            table.AddCell(cell);
                        if (dataGridView1.Rows.Count > 0)
                            for (int i = 0; i < dataGridView1.Rows.Count; i++)
                                PdfPCell[] objcell = new PdfPCell[dataGridView1.Columns.Count];
                                for (int j = 0; j < dataGridView1.Columns.Count - 0; j++)
                                    cell = new PdfPCell(new Phrase(dataGridView1.Rows[i].Cells[j].Value.ToString(), font5));
                                    cell.HorizontalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.VerticalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.Padding = PdfPCell.ALIGN_LEFT;
                                    objcell[j] = cell;
                                PdfPRow newrow = new PdfPRow(objcell);
                                table.Rows.Add(newrow);
                        document.Add(table);
                        MessageBox.Show("PDF Generated Successfully");
                        document.Close();
                    else
                        //Error 
                catch (FileLoadException fle)
                    MessageBox.Show(fle.Message);
                    MessageBox.Show("Error in PDF Generation", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    Runtime Gridview content:
    Generated PDF File:
    Thanks & Regards RAJENDRAN M
    Hello,
    Since this issue is mainly related to iTextSharp which belongs to third-party, I would recommend you consider posting this issue on its support website to get help.
    Maybe the following forum will help.
    http://support.itextpdf.com/forum/26
    Regards,
    Carl
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Start script automatically after place Word text in indesign document

    I need start a script, automatically, after plece Word text in indesign document...
    It's possible ???

    This is how I'd approach it - it captures a 'place' or 'paste' of a blob of text. It is not perfect - it will also 'fire' if you simply copy some text from one frame in the document to an empty frame, but I don't think that would be an issue in most cases.
    Create a page item on the pasteboard; use the Active Page Item Developer palette to set the List of Subjects to * (just an asterisk), and the Event Filter to subjectModified* ('subjectModified' with an asterisk appended). Set the attached ExtendScript to the script shown below.
    I've not tested this heavily - it's only a proof of concept...
    (function(theItem)
      const kKeyForSavedTextLength = "com.rorohiko.savedTextLength";
      do
        var theFrame = theItem.eventSource;
        if (! (theFrame instanceof TextFrame))
          // Not a text frame - bail out
          break;
        var previousTextLength =
          theFrame.getDataStore(
            kKeyForSavedTextLength);
        if (previousTextLength == null)
          previousTextLength = 0;
        if (previousTextLength > 0)
          // Already has text in it - bail out
          break;
        var curTextLength = theFrame.contents.length;
        theFrame.setDataStore(
          kKeyForSavedTextLength,
          curTextLength);
        if (curTextLength == 0)
          // No text in it - bail out
          break;
        alert("Pasted or placed a blob of text");
      while (false);
    (theItem));

  • Retrieving html text not html document from RichEditableText control

    Hi,
       I have some html text
    Ex- This is <b> test </b>
    which i am setting to RichEditableText using TextConvert.importToFlow(),I want to format this text just bold and italic and then get back  same html text back with
    only formatting i have done.
    Suppose i have formatted this text in RichEditableText like;
    Ex- This <i>is</i> <b> test</b>
    But when i am trying to get back this from RichEditableText using;
      TextConverter.export(richedittxt.textFlow,TextConverter.TEXT_FIELD_HTML_FORMAT,ConversionT ype.STRING_TYPE);
    It gives a complete html document with <html><body><font> tags, which i dont want. It should not includes all these extra tags.
    Can any one suggest, whether it is feasible or not ? Please reply if any one has solution
    Thanks..

    Leao,
    > I didn't write the original actionscript so I don't
    > really know how to implement what you suggested.
    Aha! Well, that can be a bit like painting yourself into a
    corner. ;)
    > Could you explain a little more?
    Spend a bit of time with the String class. Before you even
    do that, you
    may want to familiarize yourself with the concept of objects
    in general.
    See this introductory article ...
    http://www.quip.net/blog/2006/flash/actionscript-20/ojects-building-blocks
    ... then spend a bit of time in a completely new FLA
    experimenting with the
    String class. Many of its methods are specifically geared
    toward changing
    the value of the text content, including stripping away
    certain characters.
    e.g., you can use String.indexOf() to return the position of
    a particular
    character (or characters) in a string of text. You may then
    use that
    position in cahoots with String.substr() to return a new
    string that as been
    "pruned" of undesired content.
    David Stiller
    Adobe Community Expert
    Dev blog,
    http://www.quip.net/blog/
    "Luck is the residue of good design."

  • Problems getting text content of text node?

    Hi all,
    I am currently somewhat puzzled why one of the simplest parts of my code doesn't work. I am trying to get the text content of a text node (XmlValue.getNodeType() correctly returns TEXT_NODE) in Java, but what I get is always the text of the entire document retrieved from the database (no matter what text node I consult in the entire document!). Is this expected behaviour? The method I am using to retrieve the data is XmlValue.getNodeValue(), but playing around with other functions didn't show any success either.
    So my question is: what am I doing wrong here? I hope someone in here can help me with my rather stupid problem :-)
    Thanks for your help,
    Alex

    Hello Rucong,
    thanks for your quick answer. I am sorry for the delay - I was away for a couple of days.
    I boiled down my code to a simple example - however can't believe it's the query itself (since it's really basic). But maybe you see something I am missing?
    Here is my sample code that exhibits this behaviour:
              XmlManagerConfig managerConfig = new XmlManagerConfig();
              EnvironmentConfig envConfig = new EnvironmentConfig();
              envConfig.setAllowCreate(false);
              envConfig.setInitializeCache(true);
              envConfig.setInitializeLocking(false);
              envConfig.setInitializeLogging(true);
              envConfig.setTransactional(false);
              envConfig.setErrorStream(System.out);
              Environment env = new Environment(new File("D:\\dbhome"), envConfig);
              XmlManager manager = new XmlManager(env, managerConfig);
              XmlManager.setLogLevel(XmlManager.LEVEL_ALL, true);
              XmlManager.setLogCategory(XmlManager.CATEGORY_ALL, true);
              XmlContainerConfig containerConfig = new XmlContainerConfig();
              containerConfig.setAllowCreate(true);
              XmlContainer container = manager.openContainer("mpeg7samples.dbxml", containerConfig);
              XmlQueryContext context = container.getManager().createQueryContext();
              String query = "for $a in collection('mpeg7samples.dbxml') return $a";
              XmlResults result = container.getManager().query(query, context);
              while (result.hasNext()) {
                   XmlValue v = result.next();
                   if (v.getNodeType() != XmlValue.DOCUMENT_NODE)
                        throw new Exception();
                   v = v.getFirstChild();
                   if (v.getNodeType() != XmlValue.TEXT_NODE)
                        throw new Exception();
                   System.out.println(v.getNodeValue());
                   break;
              container.close();
              manager.close();
    This code executes a simple query and then dumps the first text node of the first retrieved document. As stated before the System.out.println() call dumps the text of the entire document (including <?xml...?> at the beginning and so on) though.
    Any ideas?
    Thanks in advance for your help,
    Alex

  • Trouble with the header content of php script - return wrong character: ?�?

    Hello everybody!
    I have one problem with reading of content of php's return text value.
    Ok in php script was programmed something like this:
    ===========================================
         $new1 = ereg_replace(".*OUTPUT url=\"(.{40,70})\" ?/>.*", "\\1",$XMLResponse);
         $size = strlen($new1);
         header("Content-Type: text/plain");
         header("Content-Length: " .$size);
         echo $new1;
    ==========================================
    $new1 includes url address to picture.
    Ok, in my java applet I'm using the method readLine() of DataInputStream Class, where I'm reading the return text value of the mentioned php script. The code is here:
    ==========================================
              try {
                   dis = new DataInputStream(url.openStream());
                   str = dis.readLine();
              } catch (IOException e2) {
    ==========================================
    but response is a text plus this kind of characters in beginning : ?�?
    What the trouble there? Thank you
    Jan Zitniak :)

    Hello,
    thank you for your idea, but I resolved problem with this code:
              URL urlReal = null;
              try {
                   urlReal = new URL (str); // convert String to URL
              } catch (MalformedURLException e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
    In that code I "converted" a string to url where the string happened a real URL address.
    Jan :)

  • Problems with string encoding - need the text content in char* format.

    The problem is non ASCII-characters, which comes out as some sort of unicode I need to desipher.
    Here's what I got:
    A text frame object with the TextString "Agnartjørna"
    I get the text content of this object into an ai::UnicodeString the following way:
    AIErr
    VMGetTextOfTextArt( AIArtHandle textArt, ai::UnicodeString &ucStr)
        ASUnicode *textBuffer = NULL;
        AITRY {
            TextFrameRef ateTextRef;
            AIX( sAITextFrame->GetATETextFrame( textArt, &ateTextRef));
            ATE::ITextFrame ateText( ateTextRef);
            ATE::ITextRange ateRange = ateText.GetTextRange( true);
            ASInt32 textLen = ateRange.GetSize();
            AIX( sSPBlocks->AllocateBlock( (textLen+2) * sizeof( ASUnicode), nil, (void**) &textBuffer));
            ateRange.GetContents( textBuffer, (ASInt32) textLen+1);
            /* trim off trailing newlines */
            if ((textBuffer[textLen] == '\n') || (textBuffer[textLen] == '\r'))
                 textBuffer[textLen] = 0;
            ucStr.clear();
            ucStr.append( ai::UnicodeString( textBuffer, textLen));
            sSPBlocks->FreeBlock( textBuffer);
            textBuffer = NULL;
           AIRETURN;
        AICATCH {
            if (textBuffer) sSPBlocks->FreeBlock( textBuffer);
           AIPROPAGATE;
    Now, the next step is to convert it into a form that I can use to call regexp.
    Baiscally, I want to detect the ending "tjørna" (meaning small lake) on a map label, and apply a standard abbevriation "tj^a" (with "a" superscripted).
    So the problem is to obtain the regexp pattern and the text content in same encoding.  And since the regexp library is old *char based, I would like to convert the text content in to plain old *char.
    Hence the following code:
    static AIErr
    VMAbbreviateTextArt( AIArtHandle textArt,
                             vmTextAbbrevEffectParams *params)
        AITRY {
        /* first obtain the text contents of the textArt */
           ai::UnicodeString ucText;
          const int kTextLen = 256;
          char textContent[kTextLen];
          AIX( VMGetTextOfTextArt( textArt, ucText));
          ucText.as_Roman( textContent, kTextLen);
    But textContent now has the value "Agnartj\xbfnna"  (According to XCode),
    which will not get a match on the pattern "tj([øe][rn])na\\" (with  backslash matching the end of the string)
    Any other ways to convert the textContent to a plain *char string?

    Thank you very much, your method will work fine. with
    the "UTF-8" parameter the byte[].length is double,
    cause every valid byte is preceeded by an -62, but I
    will just filter the valid bytes into a new array.
    Thanks again,
    StefanActually what you need to do is to find the character encoding that your device expects, and then you can code your strings in Arabic.
    That's the way Java does things; Strings and char values are always in UNICODE (see www.unicode.org) (which means \u600 to \u6ff for arabic) and uses a specified character encoding when translating these to and from a byte stream.
    Each national character encoding has a name. Most of them are identical to ASCII for 0-127 and code their national characters in 128-255.
    Find the encoding name for your display and, odds are, the JRE has it in the library.
    BTW the character encoding ISO-8859-1 simply maps UNICODE characters 0-255 on to bytes.

  • XML Publisher Report - Invalid character was  found in text content

    Hi Techies,
    Version Background
    Oracle apps : 11.5.10
    Oracle 9i Database
    Oracle Reports 6i
    I created a XML output type concurrent program and attached a data definition & template to it.
    My program completed with status "Warning".
    The Error is : An invalid character was found in text content.
    Then i downloaded the XML and opened it in notepad++. I found there are 2 weird characters like this ( , )
    FYI, It is a non-Ascii character so not able to paste it in this forum text field. the characters looks like double sided arrow and a forward arrow.
    I also tried loading the XML locally from RTF Template. Again it throws me same error
    Error No: -1072896760: An invalid character was found in text content.
    Additional Information:
    Data is coming from table "gl_alloc_batches.description"
    Encoding Type: UTF-8
    Please Help me how to handle such a non-ascii characters
    Edited by: 868779 on Feb 22, 2012 10:48 PM

    Hi,
    Please find below sql which will find the special characters in column of table,
    SET serveroutput ON size 1000000
    DECLARE
    PROCEDURE gooey (v_table VARCHAR2, v_column VARCHAR2)
    IS
    TYPE t_id IS TABLE OF NUMBER;
    TYPE t_dump IS TABLE OF VARCHAR2 (20000);
    TYPE t_data IS TABLE OF VARCHAR2 (20000);
    l_id t_id;
    l_data t_data;
    l_dump t_dump;
    CURSOR a
    IS
    SELECT DISTINCT column_name
    FROM dba_tab_columns
    WHERE table_name = v_table
    AND data_type = 'VARCHAR2'
    AND column_name NOT IN ('CUSTOMER_KEY', 'ADDRESS_KEY');
    BEGIN
    FOR x IN a
    LOOP
    l_id := NULL;
    l_data := NULL;
    l_dump := NULL;
    EXECUTE IMMEDIATE 'SELECT '
    || v_column
    || ', '
    || x.column_name
    || ', '
    || 'dump('
    || x.column_name
    || ')'
    || ' FROM '
    || v_table
    || ' WHERE RTRIM((LTRIM(REPLACE(TRANSLATE('
    || x.column_name
    || ',''ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789@#$%^&*()_+
    -=,!\`~{}./?:";''''[ ]'',''A''), ''A'', '''')))) IS NOT NULL'
    BULK COLLECT INTO l_id, l_data, l_dump;
    IF l_id IS NOT NULL
    THEN
    FOR k IN 1 .. l_id.COUNT
    LOOP
    DBMS_OUTPUT.put_line ( v_table
    || ' - '
    || x.column_name
    || ' - '
    || TO_CHAR (l_id (k), '999999999999')
    DBMS_OUTPUT.put_line (l_data (k));
    DBMS_OUTPUT.put_line (l_dump (k));
    DBMS_OUTPUT.put_line ('*********************');
    END LOOP;
    END IF;
    END LOOP;
    END gooey;
    BEGIN
    gooey ('GL_ALLOC_BATCHES', 'DESCRIPTION');
    END;
    Thanks,
    Amogh

  • HT2506 In pages how can I link text within the document? I can find a bookmark option and links only allows me to link to a web page

    Hi all,
    This is my first time using this discussion site! Any help would be great, as I am getting very frustrated with pages!
    What I am trying to do is link text at the bottom of a document to a page at the beginning of the document. I can't find a bookmark option adn when I clink insert link it only provides an options to link to a web page. Help! thanks

    Bookmarks are missing (sadly) and with them, the ability to create an in document link.
    I was looking for this feature in a Table of Contents. When you assign a 'style' to text in a document, you can Insert > Table of Contents. Then Pages will keep the TOC updated and you'll be able to 'link' from the page number in the TOC. (Not much help though, for the OP)

Maybe you are looking for

  • Control Centre - doesn't retain settings

    The version of Control Centre (2.5.060) on my mother board (P67a-c43,  bios v1.19, 20120801) doesn't retain settings.  In fact what would appear to be the "apply" button is showing as blank. Recently I reverted to an older version of the bios because

  • Preview bug - merging pdfs inserts wrong file

    I have consistently been able to reproduce this bug. I often merge PDFs using Preview, by dragging a file from the Finder to Preview's thumbnail sidebar open to the document to which I am merging. And what happens is that when I do that, the file tha

  • Sql Server 2008 Reporting Security issue ( added name to Server, some how cannot acces the report ) getting error message

    have added name to Sql server , users cannot access the report, getting error message , I have give all the permission to this users, so why this users still cannot look at the report, and getting error message, An error has occurred during report pr

  • Unwanted Auth. PopUps

    Hello Community, shortly after having implemented a pair of S650s we started to run into problems with unwanted auth. popup windows that would appear every now and then. Even though we have already opened a ticket and the Support Team is working hard

  • I want to reset my date to this format: 12/31/2012 11:59:59 PM .. please ad

    I want to reset my date to this format: 12/31/2012 11:59:59 PM - see code below: DECLARE v_latest_close DATE; BEGIN v_latest_close := TO_DATE ('12/31/2012 23:59:59 ','MM/DD/YYYY HH24:MI:SS'); DBMS_OUTPUT.PUT_LINE('The new date format is : '|| v_lates