Determining word boundaries when no space exists in text

I am developing a text search feature for a viewer application and I run into PDFs quite often that do not use the space character to delineate word boundaries.  For example, a text showing operator with individual glyph positioning will contain strings and positioning information like this:
[(de)15(grees)-262(and)-262(who)-262(w)10(ould)-262(contrib)20(ute)-262(an)]TJ
When the strings are concatenated the result is:
"degreesandwhowouldcontributean"
Without spaces it's not possile to split the string into words based on character information.  It would appear the only information that could be used to guess word boundaries is the glyph positioning.  I have tested the documents in Adobe Reader and the application is able to correctly determine where word boundaries are, and it must be doing so by examing the glyph positioning and metrics.
My first appreach was to get the glyph width for the space character, and assume a space is any position advance greater than the glyph width of a space.  The problem with that is the case where the font has been subsetted and the 'space' glyph is missing from the font.
My second approach was to calculate the average glyph width for the font, then assume any text advance greater than 33% of the average glyph width is a space.  Works better but still not a reliable general solution.
My question: does Adobe have a standard method for determining word boundaries when space characters are missing?

Sounds like it's time for me to play some more "guesswherethewhitespaceis".  I've gleaned a bit by experimenting with Acrobat... feeding it text strings and increasing/decreasing glyph spacing to get an idea of how whitespace threshholds are being derived.  I discovered Acrobat gets word boundaries wrong on occassion so it seems to be an inexact process at best.
Oh joy... in any case thanks for the reply.

Similar Messages

  • When typing the existing words/letters are deleted?

    When typing the existing words/letters are deleted?

    Hi
    Please find the link given below might help you to fix your issue.
    HP and Compaq Desktop PCs - Wired Keyboard Troubleshooting (Windows 7)
    Let us know how it goes!
    "I work for HP."
    ****Click the (purple thumbs up icon in the lower right corner of a post) to say thanks****
    ****Please mark Accept As Solution if it solves your problem****
    Regards
    Manjunath

  • Regex - match a word except when it's preceeded by another word

    Does anyone know how to write a regular expression that will match an occurrence of a word except when it's preceeded by another word? I'm trying to match all occurrences of the word "function" except when it's part of the phrase "end function". Is that possible in a single regular expression?

    Maybe this is just how it works, but I'm not sure why a string
    with one space wouldn't match but a string with two would.At the beginning of the spaces, the lookbehind causes the match to fail, but then the Matcher bumps ahead one position and tries again. At that point, the lookbehind expression doesn't apply anymore, so you get a match. (You should be able to confirm this by counting the spaces in your output.) I tried using the "aggressive plus" to force it to treat all the spaces as one atom, but that didn't work:
      Pattern p = Pattern.compile("(?<!end)(\\s++)function");I don't see how to do this using "pure" lookaround, but if you don't mind matching the preceding word, this will work:
      Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
                                  Pattern.MULTILINE);Getting pretty hairy, I know, but it matches the word "function", either as the first thing on the line, or preceded by a word that is not "end" (those first couple of \b's are there to ensure that only the whole word "end" will block the match). Here's how you would use this pattern to replace "function" with "method", except when it's preceded by "end":
    import java.util.regex.*;
    public class Test
      public static void main(String[] args)
        String target = "end function\n"
                      + "function test\n"
                      + "functioning test\n"
                      + "test function\n"
                      + "test function end\n"
                      + "end    function\n"
                      + "ending function\n"
                      + "rend   function\n"
                      + "end   functioning\n";
        Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
                                    Pattern.MULTILINE);
        Matcher m = p.matcher(target);
        target = m.replaceAll("$1method");
        System.out.println(target);
    }Here's the output I get:
    end function
    method test
    functioning test
    test method
    test method end
    end    function
    ending method
    rend   method
    end   functioningOf course, if you do know that there will always be exactly one space between "end" and "function", none of this is necessary; you can just use dcostakos's original lookbehind regex--except that I would add word boundaries:
    Pattern p = Pattern.compile("(?<!end\\s)\\bfunction\\b");

  • PDF From MS Word 2010 Reported Bug Still Exists

    I need to create a PDF file today that contains images in a MS Word 2010 document.  There is a bug in Acrobat XI Professional  that I reported months ago.  This bug is still here, and now I need help.  Can anyone tell me what I can do to fix this issue?
    Here is what happens, and I just reproduced this problem I found awhile back.  As I stated, I did report it, but this issue is still here!
    I am creating a user guide for software.  I insert BMP images that are screen shots.  On any page where a BMP image exists within Word, the resulting PDF document does not allow selecting of text in Acrobat Reader XI.  The selection goes totally insane and highlights random text through the entire page.  This is a software defect for certain.
    My question is can I easily fix this with minimal time?  I had to fix this last time by removing all images from MS Word and instead pasting the image in Acrobat XI.  When I do this, selecting text works.  The problem with this bug work-around is that this take lots of time (that I do not have).
    Help please???

    I found something even better.  There is a non Adobe PDF creator on my office computer that correctly creates a PDF.   The issue is within PDF creation, because I am using Acrobat Reader XI and selecting text with no problem at all.
    I did not have to change a thing other than not use Adobe Acrobat Professional to create the PDF.  The software that works correctly where Adobe's own fails is called PDF Xchange Driver by Tracker Software Products LTD.
    ...and as I already stated, I reported this issue many months ago...

  • 2013 word crashes when "save as" .Word freezes "not responding after a few minutes.

    2013 word crashes  when using the "save as" command. It does save the document but crashes. After working for 3 minutes or so on a document the programme will stop running with the screen frozen. Have to use the windows task manager to close.
    Any ideas? Very disappointed as I have already also spent a day trying to get outlook to work using all the advice on these forums. Just not impressed. I'm I going to have the same issues with excel?

    Hi,
    In regarding of the issue, it may be caused some reason, let's try to narrow down the root cause with the following methods:
    1.Try disable all the add-ins to see if this behavior could be caused by the 3rd-party add-ins, if this behavior won't occur with all the add-ins disabled, we can enable them one by one to find out which one might cause the issue.
    2.Try repair your Office as a quick fix
    3.Try start Word in
    safe mode to see if this behavior will still occur, if this behaivor won't occur in safe mode, the behavior might be caused by certain items that are loaded with Word.
    4.Try configure Windows in
    clean boot mode to check if this behavior could be caused by the 3rd-party services, if this behavior won't occur in clean boot mode, try follow the instructions in the
    How to determine what is causing the problem by performing a clean boot section in the following support article:
    http://support.microsoft.com/kb/929135
    Here is a similar issue:
    http://social.technet.microsoft.com/Forums/en-US/0d2ed46c-ba1c-4107-9651-9ff97d583720/microsoft-word-2007-freezes-when-saving?forum=word
    We could try to Delete the Word Data registry subkey to test.please try follow the instructions in the following support article:
    http://support.microsoft.com/kb/921541
    HKEY_CURRENT_USER\Software\Microsoft\Office\15.0\Word\Data
    Note: Backup your registry key before you modified.
    Regards,
    George Zhao
    TechNet Community Support

  • Microsoft Word Hangs When Opening Document From SharePoint 2010 Document Library

    I am running into an issue where Word hangs when opening certain files from a document library.  When the issue occurs, Word opens and hangs at the Downloading <Doc URL> stage.  I have tried disabling the SharePoint plugins in IE and that
    makes no difference.  I can download a local copy of the file and open in just fine.  The file in questions exists in a separate document library where it can be opened just fine (The file was copied to the library where the issue is occurring by
    a Workflow).  The issue also has only occurred on .docx files and not .doc, however, not all docx files are having the issue.
    Does anyone have any thoughts on what might be causing this?
    Thanks.

    Hi,
    According to your post, my understanding is that when you opened a certain files from a document library, it hanged at the downloading stage.
    I tried to reproduce the issue, after coping by a workflow, the .docx files opened well in my environment.
    Did the issue occur in other documents libraries? You can copy the files to a new documents library by workflow, then check whether it works.
    For more details we also can check the SharePoint ULS logs.
    C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS
    Have you check your IE settings (Tools->options->connections->LAN settings->uncheck Automatically detect settings).
    Please also use fidder tools to detect the process. You can download it by the following link.
    http://fiddler2.com/
    Thanks & Regards,
    Jason
    Jason Guo
    TechNet Community Support

  • Posting keys for account determination for transaction ERS do not exist

    Dear All,
    When I try to create an accounting document for a billing document, the below error happened:
    Posting keys for account determination for transaction ERS do not exist.
    Could any one of you meet the issue before and give me some advice? Thanks a lot.
    Ray

    Dear,
        In your pricing procedure check whether any sales deductions / discounts condition type is maintained and against that maintain ERS account key.
    and Go to VKOA and maintain gl account against your Kofi and KOFK for  account key ERS.
    try and let us know
    regards,
    SUdhir

  • The need to compress to max 1GB is obsolete. You can not determine what and how the space provided on iDisk can be used. You provide it or NOT. If you DO, then let US free, if you don't, then TELL us WHY, so we can choose to go public elsewhere. The stren

    The need to compress to max 1GB is obsolete. You can not determine what and how the space provided on iDisk can be used. You provide it or NOT. If you DO, then let US free, if you don't, then TELL us WHY, so we can choose to go public elsewhere. The strengthood for Apple for the fourthfullness to be legal, which I respect, can become too restrictive in my European eyes. Any hearing around? René

    As was noted in the introductory material for the Apple Support Communities website (presented when you registered), these forums are set up for user-to-user assistance and discussion, not for user-to-Apple communications.
    Suggest you resubmit your position statement to Apple via product feedback for iCloud, which you can access via a link on this page -
    Apple - Feedback
    ...we've all moved to iCloud, isn't it?
    Um, no, we all haven't.

  • How do you save a Word document when it says you have no memory?

    How do you save a Word document when it says you have no memory?

    Connect another disk drive to which you can save the document. Then clear space on your hard drive.
    Freeing Up Space on The Hard Drive
      1. See Lion/Mountain Lion's Storage Display.
      2. You can remove data from your Home folder except for the /Home/Library/ folder.
      3. Visit The XLab FAQs and read the FAQ on freeing up space on your hard drive.
      4. Also see Freeing space on your Mac OS X startup disk.
      5. See Where did my Disk Space go?.
      6. See The Storage Display.
    You must Empty the Trash in order to recover the space they occupied on the hard drive.
    You should consider replacing the drive with a larger one. Check out OWC for drives, tutorials, and toolkits.
    Try using OmniDiskSweeper 1.8 or GrandPerspective to search your drive for large files and where they are located.

  • Words keep expanding not spaces

    I have just started using pages and have imported a book that my mother is writing. I have to say that compared to Word, I love the way that pages works but there is one problem.
    The majority of text is "Justified" on the page but there are some words that are being pulled apart at the end of a line. As I am aware, justification should expand the spaces between words to fill the line but in this case individual words are being expanded instead and it looks horrible:
    The sentence ends a bit l i k e  t  h  i  s  .
    At first I thaught that it was extra spacing caused by confusion after importing from Word, but when I moved the cursor over the text, it jumped the gap between the letters meaning that there is no written space. I cant find any tick boxes and when I play around with the text spacing settings, it effects all of the text.
    Any ideas?
    Dan
    iMac G5   Mac OS X (10.4.7)   iWork 06

    Hullo Daniel:
    Pages should justifiy correctly, except for two known glitches which are not what you are reporting. They are, i) justification is spread across the whole line for the first words after a page break until a line break is reached (remedy, insert a return and type ahead of it), and ii) - very rare - sometimes the justification ends with a line return one character short of the right margin (remedy, expand the character spacing by 1% and return any ligatures within the line to 0% after that. Both of these bugs have been reported.
    I doubt that what you have is a Pages bug. Use Font Book to "validate" your font: select it within Font Book (not the font panel!) and go to File > Validate Font. This will report any problems. Anything not found by this means probably originates from your source material's justifying algorithm. Accordingly you should find that any means of forcing Pages to reformat the material should work.
    Try for example expanding the character spacing by 1% via the text inspector, and then returning it to 0%. To do this you need to select the text first (use keyboard shortcuts under the Help menu to find the quickest ways to do this for a block of text) then open the text inspector to make your adjustment. If the character spacing shows no percentage, this is most likely the problem.
    Setting left as Dennis suggests should also force reformatting, but you should be able to return to justification afterwards without the problem recurring - unless your source program has inserted some kind of kerning spaces which upset the Mac OS X algorithms. If so adjusting & readjusting spacing is more likely to get rid of them. Check your fonts too, as Dennis suggests.
    Mac OS X can handle Windows True Type fonts, but not Windows Postscript.
    However let me repeat: you should be able to justify. When did you last read a full-length book that wasn't, and why shouldn't you see how it looks?
    Personally I think the only writing that should be constrained to a manuscript format is on the back of a postcard.
    Cheers.
    iBook G4   Mac OS X (10.4.7)  

  • How to determine whether a file doesn't exist or doesn't have enough perms

    Hello everone,
    I am stuck in determining whether a file does not exist or does not have enough permissions so that access to this file is denied?". I am using
    java.io.File.exists() or java.io.File.canRead() methods to check this but both of them just return false in both above mentioned cases.
              In the documentation however its mentioned that these method throw SecurityException - If a security manager exists and its SecurityManager.checkRead(java.lang.String) method denies read access to the file. But then problem is to write a security manager which denies
    read access if the file does not have permissions so that exception can the thrown.
    Any suggestions or pointers will be highly appreciated.
    Thank You.
    Regards,
    Vikash Kumar

    Some platforms will let you rename or remove an open
    file.Unless those platforms support file locking, and the file has a lock on it.

  • APP-PAY-07201 Cannot perform a delete when child record exists in future

    I am trying to put end date to a payment method of any employee in HR/Payroll 11.5.10.2
    but receiving the following error message:
    APP-PAY-07201 Cannot perform a delete when child record exists in future
    Can u advise what steps I should follow to resolve this issue.
    Regards /Ali

    This note is related to termination of employee while our employee is on payroll and just want to change is payment method. But in the presence of existing payment method we cannot attched another becuase we are receiving an error:
    APP-PAY-07041: Priority must be unique within an orgainzational payment method

  • After upgrading the new operating system, it seems some useful features no longer work such as when typing a message the text anticipates the next word and when trying to delete individual messages from a contact, you can no longer tap and hold to select

    After upgrading the new operating system on my Droid Razr M it seems some useful features no longer work such as when typing a message the text anticipates the next word and when trying to delete individual messages from a contact, you can no longer tap and hold to select multiple message you have to delete them individually or the entire thread. Is there a way to get these back?

    Well, that's kind of embarrassing. And I honestly thought I paid attention to that... It works perfectly now, thank you so much!
    As expected, cdm-git also works fine since DMs only work in the root mode as of now.
    Just for the record, both type commands output "/usr/bin/startx" and pacman -Q gives "systemd 215-4".
    Last edited by looki (2014-08-23 13:04:49)

  • I need help, speach over got accidently turned on in my Iphone 6, now i cant get into my phone it wont take my pass word and when I try to use Siri it says not available  what do I do

    I need help, speach over got accidently turned on in my Iphone 6, now i cant get into my phone it wont take my pass word and when I try to use Siri it says not available  what do I do

    Hi, Jennifer.  
    Thank you for visiting Apple Support Communities.  
    I understand that VoiceOver has been activated and you are unable to access your device.  I have done this myself and here are the steps to disable this feature.  
    VoiceOver
    Press the home button three times quickly (formerly "Triple-click home").
    Accessibility Shortcut
    Managing Accessibility features using iTunes
    Connect your iPhone, iPad, or iPod touch to any computer with iTunes installed.
    In iTunes, select your device.
    From the Summary pane, click > Configure Universal Access in the Options section at the bottom.
    Select the feature you would like to use and click OK.
    Use Accessibility features in iOS
    -Jason H.  

  • My Mac got hacked. I was working on a word document when the computer suddenly started typing meaningful sentences on its own that describes how the hacker is skillful. At the that time I was on a password protected wifi and file sharing was off.

    This is the first time I get hacked this bad. I was working on a microsoft word document when the computer suddenly started typing meaningful sentences on its own that describes how skillfull the hacker is. At the that time I was on a friends wifi network that is password protected (not sure about the encyrption), the Os X Firewall was on. I was using the admin profile, however, file sharing was off. I'm very careful not to install any suspecious 3rd party software.
    So far I have verified permissions and fixed some errors there, and changed passwords.
    Do I have to erase/format my computer and reinstall the Os? If so is it adequte to use the internet recovery tool or will it use old and possibly infected EFI/Root files?
    Would appreciate the advice of all the Mac experts out there. Thanks

    Please read this whole message before doing anything.
    This procedure is a diagnostic test. It won’t solve your problem. Don’t be disappointed when you find that nothing has changed after you complete it.
    Third-party system modifications are a common cause of usability problems. By a “system modification,” I mean software that affects the operation of other software — potentially for the worse. The following procedure will help identify which such modifications you've installed. Don’t be alarmed by the complexity of these instructions — they’re easy to carry out and won’t change anything on your Mac. 
    These steps are to be taken while booted in “normal” mode, not in safe mode. If you’re now running in safe mode, reboot as usual before continuing. 
    Below are instructions to enter some UNIX shell commands. The commands are harmless, but they must be entered exactly as given in order to work. If you have doubts about the safety of the procedure suggested here, search this site for other discussions in which it’s been followed without any report of ill effects. 
    Some of the commands will line-wrap or scroll in your browser, but each one is really just a single line, all of which must be selected. You can accomplish this easily by triple-clicking anywhere in the line. The whole line will highlight, and you can then copy it. The headings “Step 1” and so on are not part of the commands. 
    Note: If you have more than one user account, Step 2 must be taken as an administrator. Ordinarily that would be the user created automatically when you booted the system for the first time. The other steps should be taken as the user who has the problem, if different. Most personal Macs have only one user, and in that case this paragraph doesn’t apply. 
    Launch the Terminal application in any of the following ways: 
    ☞ Enter the first few letters of its name into a Spotlight search. Select it in the results (it should be at the top.) 
    ☞ In the Finder, select Go ▹ Utilities from the menu bar, or press the key combination shift-command-U. The application is in the folder that opens. 
    ☞ Open LaunchPad. Click Utilities, then Terminal in the icon grid. 
    When you launch Terminal, a text window will open with a line already in it, ending either in a dollar sign (“$”) or a percent sign (“%”). If you get the percent sign, enter “sh” and press return. You should then get a new line ending in a dollar sign. 
    Step 1 
    Triple-click anywhere in the line of text below on this page to select it:
    kextstat -kl | awk '!/com\.apple/{printf "%s %s\n", $6, $7}' | open -ef
    Copy the selected text to the Clipboard by pressing the key combination command-C. Then click anywhere in the Terminal window and paste (command-V). I've tested these instructions only with the Safari web browser. If you use another browser, you may have to press the return key after pasting. A TextEdit window will open with the output of the command. If the command produced no output, the window will be empty. Post the contents of the TextEdit window (not the Terminal window), if any — the text, please, not a screenshot. You can then close the TextEdit window. The title of the window doesn't matter, and you don't need to post that. No typing is involved in this step.
    Step 2 
    Repeat with this line:
    { sudo launchctl list | sed 1d | awk '!/0x|com\.(apple|openssh|vix\.cron)|org\.(amav|apac|cups|isc|ntp|postf|x)/{print $3}'; echo; sudo launchctl getenv DYLD_INSERT_LIBRARIES; echo; sudo defaults read com.apple.loginwindow LoginHook; echo; sudo crontab -l; } 2> /dev/null | open -ef
    This time you'll be prompted for your login password, which you do have to type. Nothing will be displayed when you type it. Type it carefully and then press return. You may get a one-time warning to be careful. Heed that warning, but don't post it. If you see a message that your username "is not in the sudoers file," then you're not logged in as an administrator. 
    Note: If you don’t have a login password, you’ll need to set one before taking this step. If that’s not possible, skip to the next step. 
    Step 3
    { launchctl list | sed 1d | awk '!/0x|com\.apple|org\.(x|openbsd)/{print $3}'; echo; launchctl getenv DYLD_INSERT_LIBRARIES; echo; crontab -l 2> /dev/null; } | open -ef
    Step 4
    ls -A /e*/{cr,la,mach}* {,/}Lib*/{Ad,Compon,Ex,Fram,In,Keyb,La,Mail/Bu,P*P,Priv,Qu,Scripti,Servi,Spo,Sta}* L*/Fonts .la* 2> /dev/null | open -ef
    Important: If you formerly synchronized with a MobileMe account, your me.com email address may appear in the output of the above command. If so, anonymize it before posting. 
    Step 5
    osascript -e 'tell application "System Events" to get name of login items' | open -ef
    Remember, steps 1-5 are all copy-and-paste — no typing, except your password. Also remember to post the output. 
    You can then quit Terminal.

Maybe you are looking for