Acrobat 11what to do with the f character.

Hello all:
   I am parsing text from a PDF using the Acrobat Javascript API.  The code looks like this
   for(var a = 0; a < PageCount; a++)
for(var b = 0; b < NumWords; b++)
var TheWord     
= this.getPageNthWord(a, b, false);
The problem with this code is that if the word being returned starts with an f or sometimes a Th, the getPageNthWord() function will not recognize the whole word.
For example, the test passage has in it the word "flyway".  getPageNthWord() returns this word in two segments.  In this case the first two letters "fl" are translataed into character decimal 186 and the last four characters are returned as they are.  So I get two returns from getPageNthWord():
the Fahrenheit short-hand sign
"yway"
In another part of the document the word "fishing" appears.  It is parsed into
blank value
shing
In other words in all cases here, the word is split into two with the first return gobbling up two of the characters while the second returns the last X numbr of characters in the word.  What the first ttwo chars are actually interpreted as varies...I can find no pattern, though I suspect there is one.
Anyone have any thoughts on what might be going on  here and how to counteract it?
FYI, the rest of the document is reported on faithfully.
R,
JOhn

It's worth pointing out 2 things...
1 ligatures are normal and considered to show superior, pofessional typesetting
2 if a pdf follows recommendations, acrobat knows what ligatures mean, and will automatically replace with multiple characters on copy/paste
However, if you're extracting text it is up to you to apply the info in the pdf and some foreknowledge, and do your own replacement if needed (IF... Because some situations require the presenvation of ligatures)
You can use copy from acrobat to test if the ligatures are done right

Similar Messages

  • Hardware Inventory Not collecting Reg Keys with the Underscore Character

    We expanded hardware inventory to collect custom reg keys dynamically (from the child keys). While the new WMI class on the machine is able to see all the custom keys, SCCM is not collecting the ones with the underscore ('-') character in the name. Does
    anyone know if this is a limitation in SCCM?

    Added to configuration.mof
    #pragma namespace ("\\\\.\\root\\cimv2")
    #pragma deleteclass("Packages", NOFAIL)
    [dynamic, provider("RegProv"), ClassContext("Local|HKEY_LOCAL_MACHINE\\SOFTWARE\\PC\\Packages")]
    Class Packages
    [key] string KeyName;
    [PropertyContext("ApplicationName")] String ApplicationName;
    [PropertyContext("ApplicationVendor")] String ApplicationVendor;
    [PropertyContext("ApplicationVersion")] String ApplicationVersion;
    Added to minimof and imported into default client settings:
    #pragma namespace ("\\\\.\\root\\cimv2\\SMS")
    #pragma deleteclass("Packages_64", NOFAIL)
    [SMS_Report(TRUE),SMS_Group_Name("Packages64"),SMS_Class_ID("Packages64"),
    SMS_Context_1("__ProviderArchitecture=64|uint32"),
    SMS_Context_2("__RequiredArchitecture=true|boolean")]
    Class Packages_64 : SMS_Class_Template
    [SMS_Report(TRUE),key] string KeyName;
    [SMS_Report(TRUE)] String ApplicationName;
    [SMS_Report(TRUE)] String ApplicationVendor;
    [SMS_Report(TRUE)] String ApplicationVersion;
    Thanks

  • Acrobat can not connect with the email program, what can I do?

    Acrobat can not connect with the email program, what can I do?

    Hello,
    I'm sorry you're having trouble; unfortunately, I'm not sure I understand what's going wrong. Would you mind giving me more details about what you're trying to do and what the problem appears to be?
    Just in case you're using the desktop version of Acrobat and not Acrobat.com (which is the forum to which you posted your question), here's a link to their forum so you can repost your question there:
    http://forums.adobe.com/community/acrobat
    Thank you!

  • Using Document Filters with the Japanese character sets

    Not sure if this belongs here or on the Swing Topic but here goes:
    I have been requested to restrict entry in a JTextField to English alphaNumeric and Full-width Katakana.
    The East Asian language support also allows Hiragana and Half-width Katakana.
    I have tried to attach a DocumentFilter. The filter employs a ValidateString method which strips all non (Latin) alphaNumerics as well as anything in the Hiragana, or Half-width Katakana ranges. The code is pretty simple (Most of the code below is dedicated to debugging):
    public class KatakanaInputFilter extends DocumentFilter
         private static int LOW_KATAKANA_RANGE = 0x30A0;
         private static int LOW_HALF_KATAKANA_RANGE = 0xFF66;
         private static int HIGH_HALF_KATAKANA_RANGE = 0xFFEE;
         private static int LOW_HIRAGANA_RANGE = 0x3041;
         public KatakanaInputFilter()
              super();
         @Override
         public void replace(FilterBypass fb, int offset, int length, String text,
                   AttributeSet attrs) throws BadLocationException
              super.replace(fb, offset, length, validateString(text, offset), null);
         @Override
         public void remove(FilterBypass fb, int offset, int length)
                   throws BadLocationException
              super.remove(fb, offset, length);
         // @Override
         public void insertString(FilterBypass fb, int offset, String string,
                   AttributeSet attr) throws BadLocationException
              String newString = new String();
              for (int i = 0; i < string.length(); i++)
                   int unicodePoint = string.codePointAt(i);
                   newString += String.format("[%x] ", unicodePoint);
              String oldString = new String();
              int len = fb.getDocument().getLength();
              if (len > 0)
                   String fbText = fb.getDocument().getText(0, len);
                   for (int i = 0; i < len; i++)
                        int unicodePoint = fbText.codePointAt(i);
                        oldString += String.format("[%x] ", unicodePoint);
              System.out.format("insertString %s into %s at location %d\n",
                        newString, oldString, offset);
              super.insertString(fb, offset, validateString(string, offset), attr);
              len = fb.getDocument().getLength();
              if (len > 0)
                   String fbText = fb.getDocument().getText(0, len);
                   for (int i = 0; i < len; i++)
                        int unicodePoint = fbText.codePointAt(i);
                        oldString += String.format("[%x] ", unicodePoint);
              System.out.format("document changed to %s\n\n", oldString);
         public String validateString(String text, int offset)
              if (text == null)
                   return new String();
              String validText = new String();
              for (int i = 0; i < text.length(); i++)
                   int unicodePoint = text.codePointAt(i);
                   boolean acceptChar = false;
                   if (unicodePoint < LOW_KATAKANA_RANGE)
                        if ((unicodePoint < 0x30 || unicodePoint > 0x7a)
                                  || (unicodePoint > 0x3a && unicodePoint < 0x41)
                                  || (unicodePoint > 0x59 && unicodePoint < 0x61))
                             acceptChar = false;
                        else
                             acceptChar = true;
                   else
                        if ((unicodePoint >= LOW_HALF_KATAKANA_RANGE && unicodePoint <= HIGH_HALF_KATAKANA_RANGE)
                                  || (unicodePoint >= LOW_HIRAGANA_RANGE && unicodePoint <= LOW_HIRAGANA_RANGE))
                             acceptChar = false;
                        else
                             acceptChar = true;
                   if (acceptChar == true)
                        System.out.format("     Accepted code point = %x\n",
                                  unicodePoint);
                        validText += text.charAt(i);
                   else
                        System.out.format("     Rejected code point = %x\n",
                                  unicodePoint);
              String newString = "";
              for (int i = 0; i < validText.length(); i++)
                   int unicodePoint = validText.codePointAt(i);
                   newString += String.format("[%x] ", unicodePoint);
              System.out.format("ValidatedString = %s\n", newString);
              return validText;
          * @param args
         public static void main(String[] args)
              Runnable runner = new Runnable()
                   public void run()
                        JFrame frame = new JFrame("Katakana Input Filter");
                        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
                        frame.setLayout(new GridLayout(2, 2));
                        frame.add(new JLabel("Text"));
                        JTextField textFieldOne = new JTextField();
                        Document textDocOne = textFieldOne.getDocument();
                        DocumentFilter filterOne = new KatakanaInputFilter();
                        ((AbstractDocument) textDocOne).setDocumentFilter(filterOne);
                        textFieldOne.setDocument(textDocOne);
                        frame.add(textFieldOne);
                        frame.setSize(250, 90);
                        frame.setVisible(true);
              EventQueue.invokeLater(runner);
    }I run this code, use the language bar to switch to Full-width Katakana and type "y" followed by "u" which forms a valid Katakana character. I then used the language bar to switch to Hiragana and retyped the "Y" followed by "u". When the code sees the Hiragana codepoint generated by this key combination it rejects it. My debugging statements show that the document is properly updated. However, when I type the next character, I find that the previously rejected codePoint is being sent back to my insert method. It appears that the text somehow got cached in the composedTextContent of the JTextField.
    Here is the output of the program when I follow the steps I just outlined:
    insertString [ff59] into at location 0 <== typed y (Katakana)
    Accepted code point = ff59
    ValidatedString = [ff59]
    document changed to [ff59]
    insertString [30e6] into at location 0 <== typed u (Katakana)
    Accepted code point = 30e6
    ValidatedString = [30e6]
    document changed to [30e6]
    insertString [30e6] [ff59] into at location 0 <== typed y (Hiragna)
    Accepted code point = 30e6
    Accepted code point = ff59
    ValidatedString = [30e6] [ff59]
    document changed to [30e6] [ff59]
    insertString [30e6] [3086] into at location 0 <== typed u (Hiragana)
    Accepted code point = 30e6
    Rejected code point = 3086
    ValidatedString = [30e6]
    document changed to [30e6]
    insertString [30e6] [3086] [ff59] into at location 0 <== typed u (Hiragana)
    Accepted code point = 30e6
    Rejected code point = 3086
    Accepted code point = ff59
    ValidatedString = [30e6] [ff59]
    document changed to [30e6] [ff59]
    As far as I can tell, the data in the document looks fine. But the JTextField does not have the same data as the document. At this point it is not displaying the ff59 codePoint as a "y" (as it does when first entering the Hiragana character). but it has somehow combined it with another codePoint to form a complete Hiragana character.
    Can anyone see what it is that I am doing wrong? Any help would be appreciated as I am baffled at this point.

    You have a procedure called "remove" but I don't see you calling it from anywhere in your program. When the validation failed, call remove to remove the bad character.
    V.V.

  • Ways to configure VISA properties associated with the EOS character in LabVIEW

    I am having a great deal of trouble reading consistently from an instrument (HP5328A Universal Counter) and am investigating the EOS character. In ibic, there are 6 properties of interest accessed through the ibconfig command. The following are the properties and their default settings:
    Board level
    - IbcEOSchar EOS character is 0 (zero)
    - IbcEOSrd EOS is ignored during read operations
    - IbcEOSwrt EOI not asserted when EOS sent
    Device level
    - IbcEOSchar EOS character is 0x0A (set in ibdev)
    - IbcEOSrd EOS ignored during read operations
    - IbcEOSwrt EOS not asserted when EOS sent
    I would like to be able to programmatically set these at the beginning of my LabVIEW
    program. The only relevant VISA properties in the INSTR or GPIB BoardInterface classes I can find are:
    - Send End Enable
    - Suppress End Enable
    - Termination Character
    - Termination Character Enable
    The defaults in my LabVIEW program for both VISA classes mentioned are: Send End Enable is true; Suppress End Enable is false; Termination Character is \10 (Line Feed); and Termination Character Enable is false.
    The only property I change is setting the Termination Character Enable to true for both classes. Is there anything else I can do with properties associated with the EOS char? Is there a property like the IbcEOSwrt that may have a part in generating a service request?
    Just to confirm, the last two bytes of the instrument's ouput are \CR and \LF. I am using \LF as the EOS. Is this correct or should I be incorporating the \CR in the EOS somehow?
    ANY input is appreciated,
    Chris

    Chris:
    VISA "Termination Character" = NI-488 "IbcEOSchar"
    VISA "Termination Character Enable" = NI-488 "IbcEOSrd"
    VISA "Instr" or "GPIB Instr" = NI-488 "Device level"
    VISA "GPIB BoardInterface" = NI-488 "Board level"
    VISA cannot automatically add the termchar to the end of the written data (like IbcEOSwrt).
    Unless you specifically need to do board-level communication, which is considered advanced, I suggest ignoring that, and sticking to the device-level calls.
    If the instrument's ASCII data responses always end in \LF, then yes, just using \LF as the termchar is the correct thing. In fact, with most 488.2 devices you don't need to worry about the termchar, because the final byte of a response also contains EOI. EOI causes the driver to s
    top reading from the instrument because it knows that is the end of the response.
    You say you're having trouble, but you don't say what the symptom is. Is the read timing out? You might also want to use NI Spy to get a snapshot of what is going wrong.
    I hope this helps some.
    Dan Mondrik
    Senior Software Engineer, NI-VISA
    National Instruments

  • Problem with the EURO character

    I can't see the Euro character and I don't understand why. I use the symbol \u20AC and it doesn't work. I have use some fonts which allow the symbol but I only get a white space or nothing.
    I haven't tried to print it but I'll try. This problem happens trying to show in JLabels or JTextFields.
    Does anyone know what's wrong?
    I'm working with JDK 1.3.

    Hi Andres,
    I've seen this situation before with several characters that don't display in certain fonts even though font.canDisplay(char) evaluates true.
    For example, Courier New (among many others) and \u20ac . (I don't know why this is?)
    Anyway, use the Ariel Unicode MS font and it will display.
    Regards,
    Joe

  • Does acrobat pro XI come with the registration key?

    Does the Acrobat Professional XI come with a registration key?

    Demo version - no
    Retail version - yes
    Subscription version - no

  • Acrobat 9 PDF portfolios with the new version of Reader

    We have a somewhat large PDF portfolio with a few hundred documents in it that was created with Acrobat 9 a few years ago.  It worked great with Reader 9 and was fast, etc.  Since the updates to Reader X and XI, the portfolio is sluggish and very slow to use.  If I install Reader 9 back onto my system, it works great...  The propblem is our IT department wont do this globally since they have identified security issues with Reader 9.  Are there any fixes for this problem???

    https://support.mozilla.com/en-US/kb/Using+the+Adobe+Reader+plugin+with+Firefox
    https://support.mozilla.com/en-US/kb/Opening+PDF+files+within+Firefox

  • Printing Problems with the special character &

    Hi All,
    When our users are printing POS character & is printing as junk i.e where ever there is a & print prieview shows & along with
    some other characters <(>&<)> .
    Host Spool Acess Method is F:F: Printing on  Front end computer
    HostPrinter is __Default

    Sorry i was in between and it got posted.
    Device Type is SWINCF CASC Fonts SAPWIN Unicode
    other special characters like " and - are being recognized. It does not seems to be a print problem coz i can see the junk in print
    preview. Has any one seen this error before. I apologize if i posted this in the wrong forum
    Regards,
    Ershad Ahmed

  • Problem with the specail Character '&'

    Hi,
    I am facing problem with special character '&' in report files. When I run the report locally(report builder), i am getting the data printed with '&' correctly. But if I execute same report thr' the application, via application server, I am getting the data printed with "&:". It is not displaying &. Please suggest me how to overcome this problem.
    'ADP & Liablility' is printed as 'ADP & Liablility' in the web. We are not sending any value thr' the url. This is just a report title. Taking the value from the database and printing it.
    Appreciate the quick response.
    Thanks and Regards,
    A.K.Malathi

    Hi Andres,
    I've seen this situation before with several characters that don't display in certain fonts even though font.canDisplay(char) evaluates true.
    For example, Courier New (among many others) and \u20ac . (I don't know why this is?)
    Anyway, use the Ariel Unicode MS font and it will display.
    Regards,
    Joe

  • Problem with the special character '&'

    Hi,
    I am facing problem with special character '&' in report files. When I run the report locally(report builder), i am getting the data printed with '&' correctly. But if I execute same report thr' the application, via application server, I am getting the data printed with "&:". It is not displaying &. Please suggest me how to overcome this problem.
    'ADP & Liablility' is printed as 'ADP &: Liablility' in the web. We are not sending any value thr' the url. This is just a report title. Taking the value from the database and printing it.
    Appreciate the quick response.
    Thanks and Regards,
    A.K.Malathi

    Moderator: Off topic. This forum is for Java Collections API questions. Locking.

  • I downgraded from CC to CS6, and Acrobat XI is now in trial mode. How can I use the Acrobat X that comes with the CS6 Master Collection disc? Should I uninstall Acrobat XI then install Acrobat X? What about de-/reactivation?

    I've read about deactivating/reactivating with Photoshop. If applicable, how is that done?
    Macbook Pro 16GB Mac OS X 10.9.4

    deactivating is done by opening a programs > click help > click deactivate.
    that's not applicable to cc apps.
    to use acrobat x, uninstall acrobat xi, clean and then install acrobat x, Download Adobe Reader and Acrobat Cleaner Tool - Adobe Labs

  • Re:"The PDF Output with the Chinese Character"

    Hi all
    i understand that Reports Developed Using the Reports6i
    are not able to support the Lang Char eventhough the datbase is UTF8.
    Has some one desinged and tested the same for Reports9i.
    if yes please email me at [email protected]

    Hi Jai
    You have two options in Reports 9i if your output format is PDF.
    1. PDF Multibyte Font Aliasing feature
    2. PDF Font subset feature
    If your output format is RTF, Reports 9i fully supports UTF8 character set.
    Read the following article on OTN:
    http://otn.oracle.com/products/reports/htdocs/getstart/whitepapers/pdfenh.htm
    Regards
    Sripathy

  • The project file could not be loaded. name cannot begin with the '3' character hexadecimal value

    Hi, I get this error when I want to create a custom project (from Autodesk 3ds Max SDK 2014). I really dont understand the error, I dont use any '3' character in my project name

    Hi MegaJuzwa,
    Based on your description, it seems that it is not the VS IDE usage issue, am I right? If so, I'm afriad that it is not the correct forum for this issue.
    >>Autodesk 3ds Max SDK 2014
    It seems that it is the third party tool, if so, sorry for that it is really is out of support range of VS General forum. But I found that it also has his own support site. I suggest you post this issue to this forum:
    http://forums.autodesk.com/t5/3ds-max/ct-p/area-c1
    I also found the simliar thread in above site:
    http://forums.autodesk.com/t5/programming/getting-started-problems/td-p/4265618
    If I have misunderstood this issue, please feel free to let me know.
    Best Regards,
    Jack
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Different results with the same character array.. help me to find out why..

    Hi,
    The following code is giving some unexpected results can anyone explain it...
    *public class A1{*
         public static void main(String[] args)
              char[] a = {'1','2','3','4'};
              System.out.println(a);
    System.out.println(a+"")
    The Result is :
    *1234*
    *(some address of array)*
    Can anyone tell me why it happens that same aray is giving one time the string value and one time an address....
    Edited by: Shenshah on Jun 12, 2008 3:23 AM

    If you look at the API docs you will see that PrintStream has multiple println methods.
    Each of your println calls invokes different one.
    System.out.println(a);calls println(char[]) which according to the docs prints the characters in the array.. ("abcd" in this case)
    System.out.println(a+"") calls println(String). concatenating the empty string to a causes a.toString() to be invoked and the "address info" that you see is the result of that toString() call.

Maybe you are looking for

  • Apple devices not detected in Windows 8.1

    Win 8.1, Itunes 11.2.2.3, Ipod 5 and Ipad Air not recognized in iTunes nor PC. Tried everything I've found in the net and nothing helps. USB ports working fine. Apple mobile device USB driver not listed. This is driving me nuts! Don't send me to the

  • Process Chain load step in yellow status

    We are having a process chain with a load step.The load step is run with init with data transfer. The load step is scheduled for the next time without changing the option to delta update.So the load step turned yellow with the background job log stat

  • CO PA report output

    Hi Forum I have query how a REPORT of P & L COSTING BY PROFIT CENTER fetching values like Revenue Sales Deductions Net Sales Direct Material Cost Labor External Processing Material Overhead Labor Overhead Other Overhead Cost of Sale Interco COGS at s

  • PLM product availability matrix

    Hello experts - I am new to SAP PLM and I am doing some research for my company.  We are running SAP R/3 4.7, and I have two questions that I was hoping you could answer: What is the latest version of PLM? What is the PLM version that is compatible w

  • How to increase the input filed length in a sap standard transaction

    Hi All, Below is my user's requirement. Tcode->FS00-->f4 help for GL account number --> in this pop-up go to the tab G/L Account description in company code Here the input field G/L Long text allows 25 chars as input where as user want it to increase