Problem extracting text with pdfbox.

I'm trying to search for some text in a pdf file using pdfbox. Once I find that text I mark the page so I can split that page out using the pdfsplitter. This has been working for a little over a year now. However, recently we received a new batch of pdf's to parse last month that were generated using ghostscript. All of these pdf's failed to parse. When debugging it, I noticed that when I am getting the COSStrings out of the pdf's they seem to be invalid characters (). I thought at first that this was a batch of bad pdf's, but I can open the pdf's just fine. I also tried using the PDFTextStripper to retrieve the text and that displayed it just fine as well.
The code I am using is taken from one of the examples in pdfbox (PDFBox-0.7.3\src\org\pdfbox\examples\pdmodel\ReplaceString.java). The code is as follows (with the only difference that I am not replacing the string and saving the file, but I am searching for text and saving the page number that I find the text on):
PDDocument doc = null;
try
doc = PDDocument.load( inputFile );
List pages = doc.getDocumentCatalog().getAllPages();
for( int i=0; i<pages.size(); i++ )
PDPage page = (PDPage)pages.get( i );
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream() );
parser.parse();
List tokens = parser.getTokens();
for( int j=0; j<tokens.size(); j++ )
Object next = tokens.get( j );
if( next instanceof PDFOperator )
PDFOperator op = (PDFOperator)next;
//Tj and TJ are the two operators that display
//strings in a PDF
if( op.getOperation().equals( "Tj" ) )
//Tj takes one operator and that is the string
//to display so lets update that operator
COSString previous = (COSString)tokens.get( j-1 );
String string = previous.getString();
string = string.replaceFirst( strToFind, message );
previous.reset();
previous.append( string.getBytes() );
else if( op.getOperation().equals( "TJ" ) )
COSArray previous = (COSArray)tokens.get( j-1 );
for( int k=0; k<previous.size(); k++ )
Object arrElement = previous.getObject( k );
if( arrElement instanceof COSString )
COSString cosString = (COSString)arrElement;
String string = cosString.getString();
string = string.replaceFirst( strToFind, message );
cosString.reset();
cosString.append( string.getBytes() );
//now that the tokens are updated we will replace the
//page content stream.
PDStream updatedStream = new PDStream(doc);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens( tokens );
page.setContents( updatedStream );
doc.save( outputFile );
finally
if( doc != null )
doc.close();
If anyone knows why this code is not extracting the text as the PDFTextStripper does or how I can modify this I would greatly appreciate the help.
Thanks.

Thanks.It works-sort of. I can copy into TextEdit without problems. This is probably the solution by itself because I can save it. Copying into Words (from TextEdit) didn't work. Copying into Pages partially works: the graphs are reproduced but the layout comes out funny.
Thank you

Similar Messages

  • Extract text with specific format ?

    Hello,
    Is there a way to extract text with a specific format in a document (i.e. font type/ size or even font colour)?
    thanks in advance!

    Hello gillad,
    I am afraid only indicators are the bold font or text colour...
    Having said that, just as I was writting my response, the following idea came to me:
    Convert the pdf into word
    Click on text of interest (text with distinct format)
    Use the feature "select all text with similar formating (no data)" under "editing" within the "home" ribbon
    Having said that, hopefully a tool set/ action can be developed one day...

  • Problem extracting, query with display attributes, in RSCRM_BAPI.

    Has anybody experienced that problem?
    When I execute the extract, the job cancels itself after 2 minutes. Sometimes even shuts down the DEV server.
    There is no Dump in ST22.
    The only error I get is:
    TRUNCATE TABLE "/BIC/0CZTEST" where ZTEST is the extractor
    But when I execute the extract with the same query without the display attributes it runs just fine.
    Help please...

    Check Table("/BIC/0CZTEST") using T-code "SE14".
    If necessary, drop and creat, again.
    Good luck.

  • Problem inserting text with special Hungarian characters into MySQL database

    When I insert text into my MySQL db the special Hungarian
    characters (ő,ű) they change into "?".
    When I check the
    <cfoutput>#FORM.special_character#</cfoutput> it gives
    me the correct text, things go wrong just when writing it into the
    db. My hosting provider said the following: "please try to
    evidently specify "latin2" charset with "latin2_hungarian_ci"
    collation when performing any operations with tables. It is
    supported by the server but not used by default." At my former
    hosting provider I had no such problem. Anyway how could I do what
    my hosting provider has suggested. I read a PHP related article
    that said use "SET NAMES latin2". How could I do such thing in
    ColdFusion? Any suggestion? Besides I've tried to use UTF8 and
    Latin2 character encoding both on my pages and in the db but with
    not much success.
    I've also read a French language message here in this forum
    that suggested to use:
    <cfscript>
    setEncoding("form", "utf-8");
    setEncoding("url", "utf-8");
    </cfscript>
    <cfcontent type="text/html; charset=utf-8">
    I' ve changed the utf-8 to latin2 and even to iso-8859-2 but
    didn't help.
    Thanks, Aron

    I read that it would be the most straightforward way to do
    everything in UTF-8 because it handles well special characters so
    I've tried to set up a simple testing environment. Besides I use CF
    MX7 and my hosting provider creates the dsn for me so I think the
    db driver is JDBC but not sure.
    1.) In Dreamweaver I created a page with UTF-8 encoding set
    the Unicode Normalization Form to "C" and checked the include
    unicode signature (BOM) checkbox. This created a page with the meta
    tag: <meta http-equiv="Content-Type" content="text/html;
    charset=utf-8" />. I've checked the HTTP header with an online
    utility at delorie.com and it gave me the following info:
    HTTP/1.1, Content-Type: text/html; charset=utf-8, Server:
    Microsoft-IIS/6.0
    2.) Then I put the following codes into the top of my page
    before everything:
    <cfprocessingdirective pageEncoding = "utf-8">
    <cfset setEncoding("URL", "utf-8")>
    <cfset setEncoding("FORM", "utf-8")>
    <cfcontent type="text/html; charset=utf-8">
    3.) I wrote some special Hungarian chars
    (<p>őű</p>) into the page and they displayed
    well all the time.
    4.) I've created a simple MySQL db (MySQL Community Edition
    5.0.27-community-nt) on my shared hosting server with phpMyAdmin
    with default charset of UTF-8 and choosing utf8_hungarian_ci as
    default collation. Then I creted a MyISAM table and the collation
    was automatically applied to my varchar field into wich I stored
    data with special chars. I've checked the properties of the MySQL
    server in MySQL-Front prog and found the following settings under
    the Variables tab: character_set_client: utf8,
    character_set_connection: utf8, character_set_database: latin1,
    character_set_results: utf8, character_set_server: latin1,
    character_set_system: utf8, collation_connection: utf8_general_ci,
    collation_database: latin1_swedish_ci, collation_server:
    latin1_swedish_ci.
    5.) I wrote a simple insert form into my page and tried it
    using both the content of the form field and a hardcoded string
    value and even tried to read back the value of the
    #FORM.special_char# variable. In each cases the special Hungarian
    chars changed to "q" or "p" letters.
    Can anybody see something wrong in the above mentioned or
    have an idea to test something else?
    I am thinking about to try this same page against a db on my
    other hosting providers MySQL server.
    Here is the to the form:
    http://209.85.117.174/pages/proba/chartest/utf8_1/form.cfm
    Thanks, Aron

  • Problem copying text with graphs from web pages

    I can't copy text+graph from webpages. Copying into Pages or Microsoft Word for Mac, the text is copied the graphs are not. (With Firefox, I can't even copy text!)
    I have Parallels, and on the Microsoft side I have no problem copying both text and graphs simultaneously.
    Could you, please, advice

    Thanks.It works-sort of. I can copy into TextEdit without problems. This is probably the solution by itself because I can save it. Copying into Words (from TextEdit) didn't work. Copying into Pages partially works: the graphs are reproduced but the layout comes out funny.
    Thank you

  • The problem I have since I upgraded to Mavericks version 10.9.1 The problem appears only with Mail not with other programs, not even with my browser. When I try to zoom the text of an e-mail I received or sent , I can no longer use the keys Command   to e

    the problem I have since I upgraded to Mavericks version 10.9.1
    The problem appears only with Mail not with other programs, not even with my browser.
    When I try to zoom the text of an e-mail I received or sent , I can no longer use the keys Command + to enlarge the text, although I can reduce it with Command -.
    As I have a problem with my eyes, This is a serious matter for me.
    When I write an e-mail, if I select text and press Command +, it just displaces the text to the right.
    Now, my husband has a USB keyboard. If he connects it to my computer, his regular Command + does not work either, but  he uses the extended keyboard, then it works. Unfortunately, he needs it for a musical application which does not work with a wireless keyboard.

    Firefox 3.6.4 and 3.6.6 use a process called, "plugin-container.exe" which was using up most of my CPU when I opened up multiple tabs that contained Adobe Flash files, and caused Firefox to lock up.
    My solution was to use Firefox 3.5.10 which you can get from the Mozilla website at [http://www.mozilla.com/en-US/firefox/all-older.html]
    I am using Adobe Flash 10.1.53.64 without any problem in this version of Firefox. Check the release notes, I believe it contains all the latest security fixes in "Firefox 3.6.4".
    Hopefully, they will fix Firefox 3.6 in the next version (e.g. Firefox 3.6.7), until then you should probably use "Firefox 3.5.10".

  • Help with problem specifying text properties (style & antiAliasMethod)

    I would be very grateful if someone could help me figure out what I am doing wrong in specifying certain text properties, namely style and antiAliasMethod.  I have tried many different things and consulted the JavaScript Reference Guide, Tools Guide and PS CS4 Scripting Guide, as well as several example scripts, but I can't seem to find the information needed.
    The snippet of code below adds a text layer to a document and places text on it;  What I want is white Arial bold 10 point text with no antialiasing (this would appear as "none" in the PS CS4 text tool dialog).  This code does everything except get the bold property and the antialiasing = none property.  Instead, what this code produces is white Arial normal (not bold) crisp (not none) text,
    My attempt to get no antialiasing instead of "crisp" causes an execution error, so that line is commented out below.  The line setting style to "BOLD" does not create an execution error, but it doesn't give me bold, either (instead I get normal).
    I am certain these errors mean I do not understand the correct way to express the code to set these properties.  Can someone set me straight about how to do this?
    Thanks.
            // Create a new text layer at the top of the document
            var myLayerRef = newDocRef.artLayers.add();
            myLayerRef.kind = LayerKind.TEXT;
            myLayerRef.name = "Test";
            var myTextRef = myLayerRef.textItem;
            // Set the text color, font, size, etc.
            var textColor = new SolidColor;
            textColor.rgb.red = 255;
            textColor.rgb.green = 255;
            textColor.rgb.blue = 255;
            myTextRef.color = textColor;
            myTextRef.font = "ArialMT";
            myTextRef.style = "BOLD";
    //        myTextRef.antiAliasMethod = "NONE";
            myTextRef.size = 10;
            // Set the text position, blend mode and opacity
            var tx = 10;
            var ty = 50;       
            myTextRef.position = new Array(tx,ty);
            myLayerRef.blendMode = BlendMode.NORMAL;
            myLayerRef.opacity = 100;
            // Insert the text
            myTextRef.contents = "Hello World";

    Update:
    I discovered how to fix both problems myself.
    The antialiasing was fixed by statement: myTextRef.antiAliasMethod = AntiAlias.NONE;
    The bold problem was fixed by changing the font name to "Arial-BoldMT" and deleting the statement which attempted to set the style to "BOLD".
    Thanks for your interest.  No further help is needed on this issue.

  • Problem to extract text from HTML document

    I have to extract some text from HTML file to my database. (about 1000 files)
    The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
    The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
    The Problem is that I can't find any patten to parser the html files"
    EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
    The section code about "Classification" is below.
    Please give any idea to do this, or how to find patten to extract text from this.
    <div class="indterms"><a href="#CIT"><img name="top" src=
    "img/arrowu.gif" hspace="10" border="0" /></a><span class=
    "heading"><a name="IndexTerms">INDEX TERMS</a></span>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Primary Classification:</a></span><br />
    � <b>F.</b> <a href=
    "results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory of Computation</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>F.2</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
    COMPLEXITY</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>F.2.1</b> <a href=
    "results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Numerical Algorithms and Problems</a><br />
    </p>
    <p class="Categories"><span class="heading"><a name=
    "GenTerms">Additional�Classification:</a></span><br />
    � <b>G.</b> <a href=
    "results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Mathematics of Computing</a><br />
    � <img src="img/tree.gif" border="0" height="20" width=
    "20" /> <b>G.1</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">NUMERICAL ANALYSIS</a><br />
    � � � <img src="img/tree.gif" border="0" height=
    "20" width="20" /> <b>G.1.6</b> <a href=
    "results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Optimization</a><br />
    � � � � � <img src="img/tree.gif" border=
    "0" height="20" width="20" /> <b>Subjects:</b> <a href=
    "results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Linear programming</a><br />
    </p>
    <br />
    <p class="GenTerms"><span class="heading"><a name=
    "GenTerms">General Terms:</a></span><br />
    <a href=
    "results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Algorithms</a>, <a href=
    "results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Theory</a></p>
    <br />
    <p class="keywords"><span class="heading"><a name=
    "Keywords">Keywords:</a></span><br />
    <a href=
    "results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">Simplex method</a>, <a href=
    "results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">complexity</a>, <a href=
    "results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">perturbation</a>, <a href=
    "results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
    target="_self">smoothed analysis</a></p>
    </div>

    One approach is to download Htmlparser from sourceforge
    http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
    Another approach is to write your own parser that extract only title, abstract etc.
    1. tokenize the html file. --> convert html into tokens (tag and value)
    2. write a simple parser to extract certain information
    find out about the pattern of text you want to extract. For instance "<class "abstract">.
    then writing a rule for extracting abstract such as
    if (tag is abstract ) then extract abstract text
    apply the same concept for other tags
    Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
    good luck
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.ArrayList;
    import java.util.List;
    public class ACMHTMLParser
         private String m_filename;
         private URLLexicalAnalyzer lexical;
         List urls = new ArrayList();
         public ACMHTMLParser(String filename)
              super();
              m_filename = filename;
          * parses only title and abstract
         public void parse() throws Exception
              lexical = new URLLexicalAnalyzer(m_filename);
              String word = lexical.getNextWord();
              boolean isabstract = false;
              while (null != word)
                   if (isTag(word))
                        if (isTitle(word))
                             System.out.println("TITLE: " + lexical.getNextWord());
                        else if (isAbstract(word) && !isabstract)
                             parseAbstract();
                             isabstract = true;
                   word = lexical.getNextWord();
              lexical.close();
         public static void main(String[] args) throws Exception
              ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
              parser.parse();
         public static boolean isTag(String word)
              return ( word.startsWith("<") && word.endsWith(">"));
         public static boolean isTitle(String word)
              return ( "<title>".equals(word));
         //please modify according to the html source
         public static boolean isAbstract(String word)
              return ( "<p class=\"abstract\">".equals(word));
         private void parseAbstract() throws Exception
              while (true)
                   String abs = lexical.getNextWord();
                   if (!isTag(abs))
                        System.out.println(abs);
                        break;
         class URLLexicalAnalyzer
           private BufferedReader m_reader;
           private boolean isTag;
           public URLLexicalAnalyzer(String filename)
              try
                m_reader = new BufferedReader(new FileReader(filename));
              catch (IOException io)
                System.out.println("ERROR, file not found " + filename);
                System.exit(1);
           public URLLexicalAnalyzer(InputStream in)
              m_reader = new BufferedReader(new InputStreamReader(in));
           public void close()
              try {
                if (null != m_reader) m_reader.close();
              catch (IOException ignored) {}
           public String getNextWord() throws IOException
              int c = m_reader.read();   
              if (-1 == c) return null; 
              if (Character.isWhitespace((char)c))
                return getNextWord();
              if ('<' == c || isTag)
                return scanTag(c);
              else
                   return scanValue(c);
           private String scanTag(final int c)
              throws IOException
              StringBuffer result = new StringBuffer();
              if ('<' != c) result.append('<');
              result.append((char)c);
              int ch = -1;
              while (true)
                ch = m_reader.read();
                if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
                if ('>' == ch)
                     isTag = false;
                     break;
                result.append((char)ch);
              result.append((char)ch);
              return result.toString();
           private String scanValue(final int c) throws IOException
                StringBuffer result = new StringBuffer();
                result.append((char)c);
                int ch = -1;
                while (true)
                   ch = m_reader.read();
                   if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
                   if ('<' == ch)
                        isTag = true;
                        break;
                   result.append((char)ch);
                return result.toString();
    }

  • I cant use the highlight, underline, or strikethrough function in a specific pdf file. The file isnt locked. I used to highlight texts from that file before the latest update. The problem occurs only with that file. Urgent need. Please help. Thanks!

    i cant use the highlight, underline, or strikethrough function in a specific pdf file. The file isnt locked. I used to highlight texts from that file before the latest update. The problem occurs only with that file. Urgent need. Please help. Thanks!

    Chester31,
    Thank you very much for sharing your file with us!  Now that we are able to reproduce the problem at our end, you may stop sharing the file on Acrobat.com.
    Do you know when this problem (for not being able to add new highlight/strikeout/underline) has started?  Did you update your iOS from 7.x to 8.0 recently?
    We will continue investigating the problem and let you know what we find.
    Thank you again for your help.

  • Problem parsing XML with schema when extracted from a jar file

    I am having a problem parsing XML with a schema, both of which are extracted from a jar file. I am using using ZipFile to get InputStream objects for the appropriate ZipEntry objects in the jar file. My XML is encrypted so I decrypt it to a temporary file. I am then attempting to parse the temporary file with the schema using DocumentBuilder.parse.
    I get the following exception:
    org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element '<root element name>'
    This was all working OK before I jarred everything (i.e. when I was using standalone files, rather than InputStreams retrieved from a jar).
    I have output the retrieved XML to a file and compared it with my original source and they are identical.
    I am baffled because the nature of the exception suggests that the schema has been read and parsed correctly but the XML file is not parsing against the schema.
    Any suggestions?
    The code is as follows:
      public void open(File input) throws IOException, CSLXMLException {
        InputStream schema = ZipFileHandler.getResourceAsStream("<jar file name>", "<schema resource name>");
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = null;
        try {
          factory.setNamespaceAware(true);
          factory.setValidating(true);
          factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
          factory.setAttribute(JAXP_SCHEMA_SOURCE, schema);
          builder = factory.newDocumentBuilder();
          builder.setErrorHandler(new CSLXMLParseHandler());
        } catch (Exception builderException) {
          throw new CSLXMLException("Error setting up SAX: " + builderException.toString());
        Document document = null;
        try {
          document = builder.parse(input);
        } catch (SAXException parseException) {
          throw new CSLXMLException(parseException.toString());
        }

    I was originally using getSystemResource, which worked fine until I jarred the application. The problem appears to be that resources returned from a jar file cannot be used in the same way as resources returned directly from the file system. You have to use the ZipFile class (or its JarFile subclass) to locate the ZipEntry in the jar file and then use ZipFile.getInputStream(ZipEntry) to convert this to an InputStream. I have seen example code where an InputStream is used for the JAXP_SCHEMA_SOURCE attribute but, for some reason, this did not work with the InputStream returned by ZipFile.getInputStream. Like you, I have also seen examples that use a URL but they appear to be URL's that point to a file not URL's that point to an entry in a jar file.
    Maybe there is another way around this but writing to a file works and I set use File.deleteOnExit() to ensure things are tidied afterwards.

  • Applescript or workflow to extract text from PDF and rename PDF with the results

    Hi Everyone,
    I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
    What I need to do is name each PDF with the code which is in the text on the PDF.
    It would work like this in an ideal world:
    1. Split PDF into single pages
    2. Extract text from PDF
    3. Rename PDF using the extracted text
    I'm struggling with part 3!
    I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
    I did think about using a variable for the name, but the rename functions doesn't let me use variables.

    Hello
    You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
    Currently the regex pattern is set to:
    /HB-.._[0-9]{6}/
    which means HB- followed by two characters and _ and 6 digits.
    Minimally tested under 10.6.8.
    Hope this may help,
    H
    _main()
    on _main()
        script o
            property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
                default location (path to desktop) with multiple selections allowed
            set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
                default location (path to desktop)
            set args to ""
            repeat with a in my aa
                set args to args & a's POSIX path's quoted form & space
            end repeat
            considering numeric strings
                if (system info)'s system version < "10.9" then
                    set ruby to "/usr/bin/ruby"
                else
                    set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
                end if
            end considering
            do shell script ruby & " <<'EOF' - " & args & "
    require 'osx/cocoa'
    include OSX
    require_framework 'PDFKit'
    outdir = ARGV.shift.chomp('/')
    ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
        url = NSURL.fileURLWithPath(f)
        doc = PDFDocument.alloc.initWithURL(url)
        path = doc.documentURL.path
        pcnt = doc.pageCount
        (0 .. (pcnt - 1)).each do |i|
            page = doc.pageAtIndex(i)
            page.string.to_s =~ /HB-.._[0-9]{6}/
            name = $&
            unless name
                puts \"no matching string in page #{i + 1} of #{path}\"
                next # ignore this page
            end
            doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
            unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
                puts \"failed to save page #{i + 1} of #{path}\"
            end
        end
    end
    EOF"
        end script
        tell o to run
    end _main

  • Problems Extracting Raw off 5d Mk2 with PSE7 on Windows 7

    Hi
    Anyone had any problems extracting the RAW photos off a 5dmk2 camera using PSE7 on Windows 7.
    Basically, the problem occurs when the the camera has Raw files on aswell as jpegs. When I select the option to get files from camera or card reader, I can select the device and then the camera is analysed and the number of files counted. Fine. However, if I then press Get Photo's the window kind of freezes, nothing happens and the window can't be closed. If there are just jpegs on the camera then there's not a problem but I want to be able to use Raws.
    It works fine on Windows XP SP3 with Raw plugin 5.4.
    I've tried all the raw plugins I can up until the latest one that is suitable (v5.6).
    I'm not confusing the 32bit or 64bit versions.
    I'm following the exact instructions on installing the raw plugin's properly.
    I've tried all sorts of different settings in Windows compatibility mode.
    I've run as the administrator.
    I've read numerous posts to see what solutions people have to related but not identical problems.
    I don't want to process it separately and turn it into a company's own file format first (forgotten what it was), I just want to use the Raw files in the same way that they worked on Windows XP.
    Don't know what else to try.......
    Can anyone please help?
    Thanks in advance.....

    Ok.... I've got a work around but there's a definite bug here..... not sure why this should happen.
    Basically, the only way I can extract the raws off the camera is to import a new raw from a folder i.e. 'File > Get
    Photos > From Files or Folders'.
    Then if I go to 'File > Get Photos > From Camera or Card Reader' everything works as it should, the files get extracted
    from the camera.
    Assumed it would be working ok permenantly after having to do that yesterday, but couldn't get raws off again without
    repeating the steps above......
    ....wierd....it's almost like it needs reminding that it can handle raws before extracting.....
    I know someone on another blog said they had the same set up and didn't have a problem, but I wonder if it's something
    subtle in the differences between our base system setup......
    Anyway hope this helps give people a work around if they encounter the same issue?
    If anyone has a permenant solution please let me know as I don't want to have to rename and then reimport a raw file
    everytime I extract photos......

  • I'm having an unusual problem with my iphone 4S. When I'm making a call, it suddenly sounds like my call has been invaded by aliens! Why is it doing this? I'm also having problems sending texts. Sometimes it sends sometimes it doesn't.

    I'm having an unusual problem with my iphone 4S. When I'm making a call, it suddenly sounds like my call has been invaded by aliens! Why is it doing this? I'm also having problems sending texts. Sometimes it sends sometimes it doesn't. What is going on? Can it be fixed? I'm really not liking this new phone!!!!

    Could be an issue with your Carrier and/or location...
    The Basic Troubleshooting Steps are:
    Restart..  Reset..  Restore from Backup...  Restore as New...
    Try a Reset...
    Press the sleep/wake button & home button at the same time, keep pressing until you see the Apple logo, then release the buttons...
    http://support.apple.com/kb/ht1430

  • My iPhone 5 doesn't send any text messages or i messages and the problem isn't with the sim card as i tried it with another iPhone 5 and it worked fine.

    my iPhone 5 doesn't send any text messages or i messages and the problem isn't with the sim card as i tried it with another iPhone 5 and it worked fine.

    OK. Thanks for sharing.
    Have a nice day.

  • TS4268 Since upgrading to iOS 7.0, my friend can no longer text me.  I get all other people's texts with no problem.  I can text her, she can't text me.  It bounces back to her phone as "message undeliverable".  Does anyone know what to do about that?

    Since upgrading to iOS 7.0, my friend can no longer text me.  I get all other people's texts with no problem.  I can text her, she can't text me.  It bounces back to her phone as "message undeliverable".  Does anyone know what to do about that?

    your number can remain on the imessage server for some time

Maybe you are looking for

  • My led and external display flicker a lot. What to do?

    I have a late 2011 MacBook or 15". My MacBook screen/display flickers excessively making the mac completely unusable. I am not sure if flickering is the right term. Some times it flickers and some times turns turs completely blue or black with white

  • Adding file in SharePoint document Library using sandbox solution

     Why I am Missing Uploaded File name In Post back & am unable to add debugger give me solution thanks in advance Below is my sample code protected void btnsubmit_Click1(object sender, EventArgs e)             try                         //upload Imag

  • Org.w3c.dom. node, get number of items

    hi, how can i get the number of items of one node? thanks

  • Changing cached connections in mail

    We are running iMap for our companies email. Each mail is opening up 5 connections, which means we have 75 concurrent connections to our providers server who only allows 50. Once it goes over we start getting errors on some of the computers... I woul

  • IPhoto not launching properly

    When I launch iPhoto the menu for it appears, but there is no viewing pane. Selecting anything in the available menus result in nothing happening. What's going on?