Extracting text from a file name on export / import (Regular Expressions??)
I’m not even sure if the publishing service, File Naming, in Lightroom supports Regular Expressions or not?? Basically I’m trying to extract the left portion of the file name ie: everything before the underscore “_”. When I import a file I rename the file to reflect the current Image sequence number and then append the date the photo was taken; a typical file is as follows “05625_2008-01-05.dng” on export I would like the new name to be only the sequence number in this case “05625.jpg”. Ideally I would then like to append the folder name that contains the file… “05625 - FolderName.jpg.
I don’t want to go down the road of figuring out the correct syntax if regular expressions aren’t supported. Thanks in advance - CES
Is the imported sequence number captured in meta data somewhere or is their somewhere that all of the available fields and there reference names can be found???
Unfortunately not anywhere available to the user (but it's still stored in at least some filed I know off).
By the way, why did you choose to put the suffix at the beginning of the name (1234_2010-08-13.jpg)? The common practice is to leave the suffix at the end. That will ensure the filenames will sort in chronological order by filename and you could have easily used the suffix when exporting files. You wouldn't have this problem now.
Similar Messages
-
Extracting string from a file name
Hello,
I have a legacy (read: I didn't build it) SharePoint list that includes some validation when uploading files that's giving me some trouble.
Basically, our users are required to add files to a list in a certain filename format and based on the naming convention are approved/rejected and routed to the appropriate location.
One of the validations looks at a section of the file name and compares it to a folder name in the library.
For example, the file name format is XX_AAA_999_2014_05.xlsx and that matches on the folder name of /submissions/2014_05
Currently the rule says look at the last 7 characters of the folder and the 7 characters starting at position 12 of the filename and make sure they match.
The problem is the 999 in the example above is a sequential identifier to the project a file is associated with... e.g. they range from project 000 to project 999. We've now hit project 1000 so file being added for project 1000 (and beyond) fails because
the starting position has shifted one spot. (Note: we have active 3 digit projects so I cannot simply change that to be position 13... not to mention what that does to my history).
So, my task is to come up with something that can accomodate 3 or 4 digit numbers.
I'm trying to stick as closely to the original setup so I don't mess up the history so I'm looking at other methods of getting to the same data in the string. Another problem is that the file names include the extension and the extension can be 3 (pdf)
or 4 (xlsx) characters long.
I've tried this: =LEFT([Source File Name],SEARCH(".",[Source File Name])-1)
but that brings back everything in front of the period and I need just the 7 preceeding characters. Is there a way to limit the number of chars a LEFT() function returns?
In a nutshell, the 4 variations of file names are as follows of which I need to extract the
bolded section.:
ZZ_AAA_999_2014_05.xls
ZZ_AAA_999_2014_05.xlsx
ZZ_AAA_1000_2014_05.xls
ZZ_AAA_1000_2014_05.xlsx
Thanks!
KevinHi,
According to your description, you might want to retrieve the string “2014_05” from the file name.
I would suggest you create a SharePoint Designer workflow and implement your logic of handling the filename.
In SharePoint Designer 2010, there are already some useful utility workflow actions which can enable users to deal with the various requirements come from the business scenarios.
For the string handling, you can consider to use the
Utility Actions:
http://msdn.microsoft.com/en-us/library/office/jj164026(v=office.15).aspx
Another two links about creating SharePoint Designer workflow for your reference:
http://office.microsoft.com/en-001/sharepoint-designer-help/introduction-to-designing-and-customizing-workflows-HA101859249.aspx
http://www.codeproject.com/Tips/415107/Create-a-Workflow-using-SharePoint-Designer
Thanks
Patrick Liang
Forum Support
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact
[email protected]
Patrick Liang
TechNet Community Support -
Does IBR support to extract text from office files
hi Experts,
Can we use IBR to extract word/excel/ppt content to a text file? where is doc for this function?
Best regardsHi ,
Extraction of text based on some rules ?Is that what you are looking for ? Is it for searching on those specific set of texts from the file ? If yes , then you have Oracle Text search feature which would do that .
If it is to populate some metadata with those extracted texts , then it would be Content Categorizer component . For details on this component and it's functionality please go through the following documentation : http://docs.oracle.com/cd/E14571_01/doc.1111/e10978/c11_content_categorizer.htm#sthref1210
In either cases , IBR is not the actual engine which would do this . It is solely used for document conversion . \
Hope this helps .
Thanks,
Srinath -
Hi All
I want to extract only text from a pdf file.
I am trying to extrat text from a pdf file using PDFBox. But I am getting error. My code is like this:
* Main.java
* Created on den 10 september 2007, 23:01
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
package extracttext;
import org.pdfbox.exceptions.InvalidPasswordException;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;
//import java.awt.Rectangle;
//import java.util.List;
import org.pdfbox.pdmodel.PDPage;
public class Main {
/** Creates a new instance of Main */
public Main() {
* @param args the command line arguments
public static void main( String[] args ) throws Exception
int startPage = 1;
int endPage = Integer.MAX_VALUE;
PDDocument document = null;
try
document = PDDocument.load( "C:\\thesis\\fileread\\sim.pdf" );
if( document.isEncrypted() )
try
document.decrypt( "" );
catch( InvalidPasswordException e )
System.err.println( "Error: Document is encrypted with a password." );
System.exit( 1 );
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition( true );
stripper.setStartPage( startPage );
stripper.setEndPage( endPage );
System.out.println("Text: " + stripper.getText(document));
finally
if( document != null )
document.close();
can anybody pls help me solving this problem
Regards,
UKi get the following error message:
Exception in thread "main" java.lang.NoClassDefFoundError: org/fontbox/afm/FontMetric
at org.pdfbox.pdmodel.font.PDFont.getAFM(PDFont.java:334)
at org.pdfbox.pdmodel.font.PDSimpleFont.getFontHeight(PDSimpleFont.java:104)
at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:336)
at org.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:80)
at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at extracttext.Main.main(Main.java:55)
Java Result: 1
BUILD SUCCESSFUL (total time: 1 second)
I would appreciate if you can please help me writing a java program that can extract only test from a pdf file -
File-File - Need to extract data from source file name???
Hello Experts,
I have a unique situation. In my file to file scenario, the source file name is of the format XYZ_yymmddHHMM.dat. there is field in the target file which has to filled with the date that is there in the file name of the source file (yymmdd). How can this be achieved? Normally we do the other way round using vaiable substitution where we can name a file depending on the value in any of the target field structure.
Please help.
Regards,
YashHi,
please prepare the udf with the following code.
i mean, dynamic configuration concept.
where u get the file name, then use substring function to capture date from right side.
//write your code here
// getFileName User Defined Function
// function to create name of output file
String filename;
filename = strFile;
try {
// initialize DynamicConfiguration for create file with given name
DynamicConfiguration conf = (DynamicConfiguration) container
.getTransformationParameters()
.get(StreamTransformationConstants.DYNAMIC_CONFIGURATION);
DynamicConfigurationKey key = DynamicConfigurationKey.create( "http://sap.com/xi/XI/System/File", "FileName");
//create file with the specified name
conf.put(key, filename);
} catch (Exception ex) {
return filename;
warm regards
mahesh. -
Extracting text from PDF files produced by Oracle reports
Hi,
I am currently using Report Builder 9.0.4.0.21 to produce reports in PDF format.
The pdf reports were displayed to screen and printed to printer correctly.
However, doing a copy-and-paste from the pdf report to a text editor produces
garbage characters. Also, I failed to extract the text using any of available adobe
plug-ins. I know that the PDF report is using font subseting with custom
encoding.I have already read the pdf reference manual and it seems that
the PDF report is missing the mapping tables to convert the custom encoding
used in the report back to ansi or unicode.
Is there a solution to this problem?
Are there any environment variables or settings that I am missing?
Your help is really appreciated.Hello,
Your problem may be related to a limitation in the PDF generated with Reports 9.0.2 / 9.0.4 when using Subsetting :
Font Subsetting Creates PDF Output not Searchable with Acrobat Reader (Doc ID 311345.1)
This limitation no more exists in Reports 10.1.2 / 11.1
Regards -
How to extract text from a PDF file?
Hello Suners,
i need to know how to extract text from a pdf file?
does anyone know what is the character encoding in pdf file, when i use an input stream to read the file it gives encrypted characters not the original text in the file.
is there any procedures i should do while reading a pdf file,
File f=new File("D:/File.pdf");
FileReader fr=new FileReader(f);
BufferedReader br=new BufferedReader(fr);
String s=br.readLine();any help will be deeply appreciated.jverd wrote:
First, you set i once, and then loop without ever changing it. So your loop body will execute either 0 times or infinitely many times, writing the same byte every time. Actually, maybe it'll execute once and then throw an ArrayIndexOutOfBoundsException. That's basic java looping, and you're going to need a firm grip on that before you try to do anything as advanced as PDF reading. the case.oops you are absolutely right that was a silly mistake to forget that,
Second, what do the docs for getPageContent say? Do they say that it simply gives you the text on the page as if the thing were a simple text doc? I'd be surprised if that's the case.getPageContent return array of bytes so the question will be:
how to get text from this array? i was thinking of :
private void jButton1_actionPerformed(ActionEvent e) {
PdfReader read;
StringBuffer buff=new StringBuffer();
try {
read = new PdfReader("d:/getjobid2727.pdf");
read.getMetaData();
byte[] data=read.getPageContent(1);
int i=0;
while(i>-1){
buff.append(data);
i++;
String str=buff.toString();
FileOutputStream fos = new FileOutputStream("D:/test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(str);
out.close();
read.close();
} catch (Exception f) {
f.printStackTrace();
"D:/test.txt" hasn't been created!! when i ran the program,
is my steps right? -
How to extract text from a PDF file using php?
How to extract text from a PDF file using php?
thanks
fabio> Do you know of any other way this can be done?
There are many ways. But this out of scope of this forum. You can try this forum: http://forum.planetpdf.com/ -
How can i extract text from Power point files,wod files,pdf files
hi friends,
i need to extract text from the power point files,word files,pdf files for my application.Is it possible to extract the text from the those files .If yes plz give solution to this problem.i would be thankful if u givve solution to this problem.My reply would be the same.
http://forum.java.sun.com/thread.jspa?threadID=676559&tstart=0 -
Extracting text from .doc,.ppt,.pdf files
How can i extract ascii text from the file types like .doc , .ppt , .pdf ,. xls ..etc.
Any tips/hints would be helpful
Thanks
RamaHI I tried for pdf, but didn't succeed
Following is for text/Doc files
<pre>
import java.io.*;
public class Doc
public static void main(String[] args)
try{
File file=new File("c:\\downloads\\WP2001.doc");
LineNumberReader buffer=new LineNumberReader(new FileReader(file));
StringBuffer buff=new StringBuffer("");
boolean valid=true;
while(valid)
//System.out.println(buffer.readLine());
buff=buff.append(buffer.readLine()+"\n");
if(buffer.read()==-1)
valid=false;
else
buffer.setLineNumber(buffer.getLineNumber()+1);
System.out.println(buff);
catch(Exception fne)
System.out.println("File Not Found"+fne);
</pre>
pathreading -
Reading long text from excel file to an internal table
Hi
Can any body tell me how to read long text from excel file to an internal table.
When i am using this FM KCD_EXCEL_OLE_TO_INT_CONVERT then it is reading only 32 characters from each cell.
But in my excel sheet in one of the cell has very long text which i need to upload into a internal table.
may i know which FM or what logic i need to use for this problem.
RegardsHi,
Here is an example program. It will upload an Excel file with two columns. You could also assign the Excel structure dynamically, but I wanted to keep the example simple. The main point is that the internal table (it_excel in this example) must match the Excel structure that you want to convert.
Remember, this is just an example to help you figure out how to properly use the technique. It will certainly need to be modified to fit your requirements, and as always there may be a better way to get the Excel converted... this is just one possibility that has worked for me in the past.
*& Report zexcel_upload_test *
REPORT zexcel_upload_test.
TYPE-POOLS: truxs.
TYPES: BEGIN OF ty_excel,
col_a(10) TYPE n,
col_b(35) TYPE c,
END OF ty_excel.
DATA: l_data_tab TYPE TABLE OF string,
l_text_data TYPE truxs_t_text_data,
l_gui_filename TYPE string,
it_excel TYPE TABLE OF ty_excel.
FIELD-SYMBOLS: <wa_excel> TYPE ty_excel.
PARAMETERS: p_file TYPE rlgrap-filename.
* Pass the file name in the correct format
l_gui_filename = p_file.
* Upload data from PC
CALL METHOD cl_gui_frontend_services=>gui_upload
EXPORTING
filename = l_gui_filename
filetype = 'ASC'
has_field_separator = 'X'
CHANGING
data_tab = l_data_tab
EXCEPTIONS
file_open_error = 1
file_read_error = 2
no_batch = 3
gui_refuse_filetransfer = 4
invalid_type = 5
no_authority = 6
unknown_error = 7
bad_data_format = 8
header_not_allowed = 9
separator_not_allowed = 10
header_too_long = 11
unknown_dp_error = 12
access_denied = 13
dp_out_of_memory = 14
disk_full = 15
dp_timeout = 16
OTHERS = 17.
IF sy-subrc <> 0.
* MESSAGE ...
EXIT.
ENDIF.
* Convert from Excel into the appropriate itab
l_text_data[] = l_data_tab[].
CALL FUNCTION 'TEXT_CONVERT_XLS_TO_SAP'
EXPORTING
i_field_seperator = 'X'
i_tab_raw_data = l_text_data
i_filename = p_file
TABLES
i_tab_converted_data = it_excel
EXCEPTIONS
conversion_failed = 1
OTHERS = 2.
IF sy-subrc <> 0.
* MESSAGE ...
EXIT.
ENDIF.
LOOP AT it_excel ASSIGNING <wa_excel>.
* Do something here...
ENDLOOP.
AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_file.
PERFORM filename_get CHANGING p_file.
* FORM filename_get *
FORM filename_get CHANGING p_in_file TYPE rlgrap-filename.
DATA: l_in_file TYPE string,
l_filetab TYPE filetable,
wa_filetab TYPE LINE OF filetable,
l_rc TYPE i,
l_action TYPE i,
l_init_dir TYPE string.
* Set the initial directory to whatever you want it to be
l_init_dir = 'C:\'.
* Call the file open dialog without multiselect
CALL METHOD cl_gui_frontend_services=>file_open_dialog
EXPORTING
window_title = 'Load file'
default_extension = '.XLS'
default_filename = l_in_file
initial_directory = l_init_dir
multiselection = 'X'
CHANGING
file_table = l_filetab
rc = l_rc
user_action = l_action
EXCEPTIONS
file_open_dialog_failed = 1
cntl_error = 2
error_no_gui = 3
OTHERS = 4.
IF sy-subrc <> 0.
REFRESH l_filetab.
ENDIF.
* Read the selected filename
READ TABLE l_filetab INTO wa_filetab INDEX 1.
IF sy-subrc = 0.
p_in_file = wa_filetab-filename.
ENDIF.
ENDFORM. " filename_get
Regards,
Jamie -
Problem to extract text from HTML document
I have to extract some text from HTML file to my database. (about 1000 files)
The HTML files are get from ACM Digital Library. http://portal.acm.org/dl.cfm
The HTML page is about the information of a paper. I only want to get the text of "Title" "Abstract" "Classification" "Keywords"
The Problem is that I can't find any patten to parser the html files"
EX: I need to get the Classification = "Theory of Computation","ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY","Numerical Algorithms and Problem","Mathematics of Computing","NUMERICAL ANALYSIS"......etc .
The section code about "Classification" is below.
Please give any idea to do this, or how to find patten to extract text from this.
<div class="indterms"><a href="#CIT"><img name="top" src=
"img/arrowu.gif" hspace="10" border="0" /></a><span class=
"heading"><a name="IndexTerms">INDEX TERMS</a></span>
<p class="Categories"><span class="heading"><a name=
"GenTerms">Primary Classification:</a></span><br />
� <b>F.</b> <a href=
"results.cfm?query=CCS%3AF%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Theory of Computation</a><br />
� <img src="img/tree.gif" border="0" height="20" width=
"20" /> <b>F.2</b> <a href=
"results.cfm?query=CCS%3A%22F%2E2%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">ANALYSIS OF ALGORITHMS AND PROBLEM
COMPLEXITY</a><br />
� � � <img src="img/tree.gif" border="0" height=
"20" width="20" /> <b>F.2.1</b> <a href=
"results.cfm?query=CCS%3A%22F%2E2%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Numerical Algorithms and Problems</a><br />
</p>
<p class="Categories"><span class="heading"><a name=
"GenTerms">Additional�Classification:</a></span><br />
� <b>G.</b> <a href=
"results.cfm?query=CCS%3AG%2E%2A&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Mathematics of Computing</a><br />
� <img src="img/tree.gif" border="0" height="20" width=
"20" /> <b>G.1</b> <a href=
"results.cfm?query=CCS%3A%22G%2E1%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">NUMERICAL ANALYSIS</a><br />
� � � <img src="img/tree.gif" border="0" height=
"20" width="20" /> <b>G.1.6</b> <a href=
"results.cfm?query=CCS%3A%22G%2E1%2E6%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Optimization</a><br />
� � � � � <img src="img/tree.gif" border=
"0" height="20" width="20" /> <b>Subjects:</b> <a href=
"results.cfm?query=CCS%3A%22Linear%20programming%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Linear programming</a><br />
</p>
<br />
<p class="GenTerms"><span class="heading"><a name=
"GenTerms">General Terms:</a></span><br />
<a href=
"results.cfm?query=genterm%3A%22Algorithms%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Algorithms</a>, <a href=
"results.cfm?query=genterm%3A%22Theory%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Theory</a></p>
<br />
<p class="keywords"><span class="heading"><a name=
"Keywords">Keywords:</a></span><br />
<a href=
"results.cfm?query=keyword%3A%22Simplex%20method%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">Simplex method</a>, <a href=
"results.cfm?query=keyword%3A%22complexity%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">complexity</a>, <a href=
"results.cfm?query=keyword%3A%22perturbation%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">perturbation</a>, <a href=
"results.cfm?query=keyword%3A%22smoothed%20analysis%22&coll=ACM&dl=ACM&CFID=22820732&CFTOKEN=38147335"
target="_self">smoothed analysis</a></p>
</div>One approach is to download Htmlparser from sourceforge
http://htmlparser.sourceforge.net/ and write the rules to match title, abstract etc.
Another approach is to write your own parser that extract only title, abstract etc.
1. tokenize the html file. --> convert html into tokens (tag and value)
2. write a simple parser to extract certain information
find out about the pattern of text you want to extract. For instance "<class "abstract">.
then writing a rule for extracting abstract such as
if (tag is abstract ) then extract abstract text
apply the same concept for other tags
Attached is the sample parser that was used to extract title and abstract from acm html files. Please modify to include keyword and other fields.
good luck
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
public class ACMHTMLParser
private String m_filename;
private URLLexicalAnalyzer lexical;
List urls = new ArrayList();
public ACMHTMLParser(String filename)
super();
m_filename = filename;
* parses only title and abstract
public void parse() throws Exception
lexical = new URLLexicalAnalyzer(m_filename);
String word = lexical.getNextWord();
boolean isabstract = false;
while (null != word)
if (isTag(word))
if (isTitle(word))
System.out.println("TITLE: " + lexical.getNextWord());
else if (isAbstract(word) && !isabstract)
parseAbstract();
isabstract = true;
word = lexical.getNextWord();
lexical.close();
public static void main(String[] args) throws Exception
ACMHTMLParser parser = new ACMHTMLParser("./acm_html.html");
parser.parse();
public static boolean isTag(String word)
return ( word.startsWith("<") && word.endsWith(">"));
public static boolean isTitle(String word)
return ( "<title>".equals(word));
//please modify according to the html source
public static boolean isAbstract(String word)
return ( "<p class=\"abstract\">".equals(word));
private void parseAbstract() throws Exception
while (true)
String abs = lexical.getNextWord();
if (!isTag(abs))
System.out.println(abs);
break;
class URLLexicalAnalyzer
private BufferedReader m_reader;
private boolean isTag;
public URLLexicalAnalyzer(String filename)
try
m_reader = new BufferedReader(new FileReader(filename));
catch (IOException io)
System.out.println("ERROR, file not found " + filename);
System.exit(1);
public URLLexicalAnalyzer(InputStream in)
m_reader = new BufferedReader(new InputStreamReader(in));
public void close()
try {
if (null != m_reader) m_reader.close();
catch (IOException ignored) {}
public String getNextWord() throws IOException
int c = m_reader.read();
if (-1 == c) return null;
if (Character.isWhitespace((char)c))
return getNextWord();
if ('<' == c || isTag)
return scanTag(c);
else
return scanValue(c);
private String scanTag(final int c)
throws IOException
StringBuffer result = new StringBuffer();
if ('<' != c) result.append('<');
result.append((char)c);
int ch = -1;
while (true)
ch = m_reader.read();
if (-1 == ch) throw new IllegalArgumentException("un-terminate tag");
if ('>' == ch)
isTag = false;
break;
result.append((char)ch);
result.append((char)ch);
return result.toString();
private String scanValue(final int c) throws IOException
StringBuffer result = new StringBuffer();
result.append((char)c);
int ch = -1;
while (true)
ch = m_reader.read();
if (-1 == ch) throw new IllegalArgumentException("un-terminate value");
if ('<' == ch)
isTag = true;
break;
result.append((char)ch);
return result.toString();
} -
Applescript or workflow to extract text from PDF and rename PDF with the results
Hi Everyone,
I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
What I need to do is name each PDF with the code which is in the text on the PDF.
It would work like this in an ideal world:
1. Split PDF into single pages
2. Extract text from PDF
3. Rename PDF using the extracted text
I'm struggling with part 3!
I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
I did think about using a variable for the name, but the rename functions doesn't let me use variables.Hello
You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
Currently the regex pattern is set to:
/HB-.._[0-9]{6}/
which means HB- followed by two characters and _ and 6 digits.
Minimally tested under 10.6.8.
Hope this may help,
H
_main()
on _main()
script o
property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
default location (path to desktop) with multiple selections allowed
set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
default location (path to desktop)
set args to ""
repeat with a in my aa
set args to args & a's POSIX path's quoted form & space
end repeat
considering numeric strings
if (system info)'s system version < "10.9" then
set ruby to "/usr/bin/ruby"
else
set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
end if
end considering
do shell script ruby & " <<'EOF' - " & args & "
require 'osx/cocoa'
include OSX
require_framework 'PDFKit'
outdir = ARGV.shift.chomp('/')
ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
url = NSURL.fileURLWithPath(f)
doc = PDFDocument.alloc.initWithURL(url)
path = doc.documentURL.path
pcnt = doc.pageCount
(0 .. (pcnt - 1)).each do |i|
page = doc.pageAtIndex(i)
page.string.to_s =~ /HB-.._[0-9]{6}/
name = $&
unless name
puts \"no matching string in page #{i + 1} of #{path}\"
next # ignore this page
end
doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
puts \"failed to save page #{i + 1} of #{path}\"
end
end
end
EOF"
end script
tell o to run
end _main -
Javascript in .PDF's - Extracting text from .doc or .txt
Hello All,
I am very new to javascript in .pdfs -- but I seem to find my around doing misc. work with forms. What I need:
I need a Form with a Submit button that locates and extracts the text from a file and places it into another field.
Example:
on Server:
one.txt or one.doc
two.txt or two.doc,
...etc
You type one in the form and submit -- it pulls all of the txt from one.txt off the server and places it into a field.
Also if there is anyway to do this with tables to avoid multiple files that would be even better.
I know I am a newbie, but this would be a game-changer for what I do.
Thank you.Thanks for the advice
It is accessing a shared file server (among employees) and it is to be a .pdf used in Adobe Acrobat Professional
Basically I want it to be a form that pulls txt based on what was in the typed box or drop-down menu from a .txt or .doc -
How do I grab a value from a file name and load it in a field/column?
Hi,
I am loading this .txt file (OUS_RAW_NYC_05_2011.txt) into an internal table i_raw.
I want to pick out the NYC characters from the file name and fill it as value for <wa_raw>-field1 for all records.
How do I do this?
Pls advice.
Thanks!Hi Durgesh,
I am doing this in a program via SE38 and not via transformation routine.
Now I am working on this piece of code to get the value.
file_str = //rdmsbw/dev/data/output/all/OUS_RAW_HCM_05_2011.txt
I only want the characters HCM from file_str.
When I execute this code below:
MOVE file_str TO org_unit.
WRITE / org_unit+37(12).
<wa_raw>-/BIC/ZOUORGUT = org_unit.
my output is = //rdm
how do i extract out HCM?
Pls advice.
PS: also pls help me out with my another post
http://forums.sdn.sap.com/thread.jspa?threadID=2141618&tstart=0
Maybe you are looking for
-
I tried adding the Yahoo account in my Mail, but it said "Trying to log into mail server "plus.pop.mail.yahoo.com" failed. This server may require additional fee for Yahoo POP access"? What does this mean? Is there no way at all that I can somehow ad
-
I have created a form in Formscentral but cannot save it as a PDF to send.
I have created a form in Formscentral but cannot save it as a PDF to send.
-
i can't open my adobe acrobat reader - just can view the miniature version on the taskbar, but it won't open to full screen size? PLS HELP!
-
Guys, I need some help! I was checking the space at my Macbook Air and then I saw that I am using 13 GB of pictures. I am pretty sure that I don't use all this space for pictures but I can't find where this space come from! I cleaned already the tras
-
License issue...losing my program in 4 days
Please help with elements 9 and new version 12. I have 9 on two laptops and will lose both in 4 days. Want to keep one 9 and the new 12. How do I proceed? Thanks.