Metadata extract from PDF / Photoshop files

PDF and Photoshop files can have embedded metadata.
When checking these files into a repository can the metadata be searched for, extracted then used to fill in the UCM metadata fields when checking these files in ?
Thanks ...

Indeed very strange. The description tab from file info reflects also my
text for the IPTC description in Bridge metadata panel, that said, I just
peaked at the forum page of Extensis and it might been related to a
different approach with XMP data, but you better look for your self a little
deeper, do a search for IPTC description on this forum page (don't know if
you have single user or server):
http://forums.extensis.com/jforum/forums/list.page
In the mean time some workarounds. First try a purge cache for folders using
the Tools/cache menu in Bridge, don't think it will help you but you can
try.
If the problem is solved by resaving in PS you can create an action for it
in the action panel (either replace the existing file or save to a new
folder is possible in an action) and use a batch to deal with a large amount
in one time.
Don't know if it is possible but you might change your workflow and embed
metadata using Bridge? I have tried Portfolio several times myself but
found it way to slow and very unproductive, but that is entirely my opinion
I've just played around with this a bit more. If I go back to the Photoshop
file and re-save it (without making any other changes), then go back to
Bridge, all the metadata miraculously appears. But if Photoshop can see it in
the first instance, why won't Bridge pick it up without the file being
re-saved?

Similar Messages

Problem parsing XML with schema when extracted from a jar file

I am having a problem parsing XML with a schema, both of which are extracted from a jar file. I am using using ZipFile to get InputStream objects for the appropriate ZipEntry objects in the jar file. My XML is encrypted so I decrypt it to a temporary file. I am then attempting to parse the temporary file with the schema using DocumentBuilder.parse.
I get the following exception:
org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element '<root element name>'
This was all working OK before I jarred everything (i.e. when I was using standalone files, rather than InputStreams retrieved from a jar).
I have output the retrieved XML to a file and compared it with my original source and they are identical.
I am baffled because the nature of the exception suggests that the schema has been read and parsed correctly but the XML file is not parsing against the schema.
Any suggestions?
The code is as follows:
public void open(File input) throws IOException, CSLXMLException {
    InputStream schema = ZipFileHandler.getResourceAsStream("<jar file name>", "<schema resource name>");
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = null;
    try {
      factory.setNamespaceAware(true);
      factory.setValidating(true);
      factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
      factory.setAttribute(JAXP_SCHEMA_SOURCE, schema);
      builder = factory.newDocumentBuilder();
      builder.setErrorHandler(new CSLXMLParseHandler());
    } catch (Exception builderException) {
      throw new CSLXMLException("Error setting up SAX: " + builderException.toString());
    Document document = null;
    try {
      document = builder.parse(input);
    } catch (SAXException parseException) {
      throw new CSLXMLException(parseException.toString());
    }

I was originally using getSystemResource, which worked fine until I jarred the application. The problem appears to be that resources returned from a jar file cannot be used in the same way as resources returned directly from the file system. You have to use the ZipFile class (or its JarFile subclass) to locate the ZipEntry in the jar file and then use ZipFile.getInputStream(ZipEntry) to convert this to an InputStream. I have seen example code where an InputStream is used for the JAXP_SCHEMA_SOURCE attribute but, for some reason, this did not work with the InputStream returned by ZipFile.getInputStream. Like you, I have also seen examples that use a URL but they appear to be URL's that point to a file not URL's that point to an entry in a jar file.
Maybe there is another way around this but writing to a file works and I set use File.deleteOnExit() to ensure things are tidied afterwards.

Photoshop 10 can't open image CD's from another Photoshop file.

My Windows 7 and Photoshop Elements 10 will not open image CD's downloaded from another Photoshop file. Could open them in a previosu Windows XP and older Photoshop. Now get message: "This file does not have a program associated with it for performing action. Please install a program or, if one is already installed, create an association in the Default Program default control panel."
There are hundreds of asscociation files listed. WHICH ONE DO I CHOOSE????

Right-click on file and choose properties.
To the right of “Opens With” click on the change button.
In the Open With dialog click on the Browse button
Navigate to:
Program Files (X86)\Adobe\Photoshop Elements 10
Select (highlight) PhotoshopElementsEditor.exe
Click Apply
Click OK

Extracting from an xml file..

hi.. i am required to write a program that extracts and perform certain calulations with stock numbers. I have been able to complete the calcuations part but need a little help(ideas, guildines) when xtracting stuff from a xml file that has six lines..
<?xml version="1.0"?>
<Portfolio>
<Investment><Stock symbol="RY"></Stock><Qty>15</Qty><Comment>this is a good one; esp. the BookValue</Comment><BookValue>58.5</BookValue></Investment>
<Investment><Comment>this is not "bad"</Comment><Stock symbol = "NT"></Stock><Qty>10</Qty> <BookValue>108</BookValue></Investment>
<Investment><Qty>2</Qty><BookValue>45.75</BookValue><Comment>not sure about this; >= last time</Comment><Stock symbol= "BMO" > </Stock></Investment>
</Portfolio>
I'm not sure if this post will display properly but the correct format is this..
Line 1: <?xml version=...
Line 2: <Portfolio...
Line 3: <Investment...
Line 4: <investment...
Line 5: <investment...
Line 6: </Portfolio.....
The stuff i'm supposed to extract are..
RY 15 58.50
NT 10 108.00
BMO 2 45.75
I am new to java and i am still beginning to realize the many methods that enable me to do this.. if anyone has any ideas please let me know.. thank you for ur time.
(I'm not requesting actual coding..) Thanks again.

Hey kk.. thanks for helping out but unfortunatly.. i can't use a parser like u.. since i'm taking an introductory course we're only limited to a number of classes to use(we're not suppoesd to know much we have to work with what we have). This is the coding i have so far..
public class Test
{     public static void main(String[] args)
     {     YorkReader reader = new YorkReader("pfxml.a1.xml");
          String line = reader.readLine();
          String tag = line.substring(0,7);
          String investTag = "<Invest";
          while (!tag.equalsIgnoreCase(investTag))
               York.println(line);
               line = reader.readLine();
               tag = line.substring(0,7);
          while(tag.equalsIgnoreCase(investTag))
               York.println(line);
               line = reader.readLine();
               StringTokenizer st = new StringTokenizer(line, "Stock");
               String line1 = st.nextToken();
               StringTokenizer stSymbol = new StringTokenizer(st.nextToken(), "\"");
               String symbolS = stSymbol.nextToken();
               York.println();
               York.println(line1);
               York.println();
               tag = line.substring(0,7);
          reader.close();
I've managed to get the program to skip lines that don't match the string "Invest" and once i get to invest i'm planning to use loop and extract the stuff i need out of the line.. but for some reason(IT'S DRIVING ME CRAZY).. is that when i try to use the Tokenizer class to parse out what i need.. it doesn't work. U see.. in the code i used the string "Stock" as the delimiter.. but when i check to see if the code is working it's not returning what i'm asking for.
When i ask it to print out the first token it returns this..
<Inves
and when i ask for the second it gives me..
men
what is going on... argh.. i thought i had a great idea to do this too.. like wat i was planning was.. use "Stock" as the delimiter and get the string
symbol="RY"></
then use tokenizer again and use " as the delimiter to get the RY string which is what i want.. but for some reason it's not doing that.. if u have time could u take a look at my coding and let me know what's wrong?.. thanks a lot for ur help.. i really appreciate it.

Reg Extracting data from PDF using file adapter

Hi Experts,
In my business process I will get different files in the form of pdf. I have to extract the fields from the file and send it to ECC system. Can any one suggest me how to do it without using CA.
Regards
Suresh

you might have to use a custom solution.
you will find tips here Trouble writing out a PDF in XI/PI?

Extracting from PDF in Preview (bad quality)

I have a huge PDF document, but need only a few of those pages, so I thought I'd extract the desired pages.
Having the PDF file opened in Preview, and on the first page I wanted to save I went to "File"-"Save as" and chose a suitable format.
The problem however is that the resulting saved file is of terrible quality. Nothing like the original PDF file at all.
I've tried a variety of formats: Photoshop, TIFF, PNG and so on.
What am I doing wrong?

Hi,
I'm not sure about the quality issue, but as an alternative have you tried choosing the desired page numbers in the Print dialog and select 'PDF > Save as PDF...' at the bottom to save out the pages?

Unable to download metadata extracts from HFM

I have installed 11.1.2.3 using an Oracle database on windows server 2008. After extracting a metadata file, when I click on download I am unable to download the file. At the bottom of the screen there is a message "http://servername:9000/hfmadf/faces/hfm.jspx? accessibility mode= false.
I would appreciate if some one could help me to resolve this issue.
Thank you

HornHonker wrote:
I was planning on buying my kids some new 4th gen iPod touch units and am considering a new iPhone myself, but if Apple are going to brutally excommunicate earlier generations of hardware this way, then I will be going Android instead :-(
It has nothing to do with the hardware, and not all devices seem to be affected. If you bought 4th gen iPods, they wouldn't have this problem. But the problem has nothing to do with what generation the iPod is. As for yourself getting a new phone, if you're not going to get an iPhone, I strongly recommend a Windows 7 phone or a Blackberry phone before Android. But I hope your problem gets worked out and your next phone is an iPhone. Good luck

Data Extraction from PDF

I am creating a large PDF form that people will fill out, save, and return to us. I am wondering if there is a simple method to extract the data from the PDF and store the values in a database (excel or access).
I can program in java if needed, but I'd prefer a simpler solution if one exists.

Niall is correct for a direct-connected form. But there are a wider variety of choices that you may want to play with:
Use the distribute and collect features of forms in Acrobat and Acrobat.com. This gives an automated way to bring all of the data together for you. You can see this at http://formcentral.acrobat.com/ or Tools/Forms/Distribute in Acrobat X.
Directly connect your form as Niall suggested; you'll need to reader-extend with the server product to enable this data access with Reader, or use full
Gather the data results from form submissions (HTTP post)
Extract the data out of reader-extended full forms sent back. LiveCycle has a service called Form Data Integration, but there are also opensource toolkits that can extract the data. Also, LiveCycle (now ADEP) can build full processes around the data, but that may be much more than you are looking for.
You may want to start with Acrobat.com's capabilities to see if this is sufficient.

Letter "f" can not be extracted from PDF

When I select the words in PDF and do copy, paste, wherever there is "f" it always shows "?" and the letter that close to "f" is also missing. This problem not only shows when I do copy-paste but also shows when other software is extracting information from my PDF document. Can anyone tell me what's wrong?
Thanks,
-Maxine

Thanks. I just realize this only shows on the PDF I generated using Letax. Is there any idea on how to resolve it?
-Maxine

Attributing certain metadata fields from FCs to file exports

Hello
I am looking to encode predefined, personalised metadata fields that are in FCs into the header of exported FCs files. Example, I have hundreds of files from castings and I want to keep certain non-technical info that is in FCs DB, in the header of the file, like the name, age, date etc. H264 as it seems to be reasonably metadata friendly.
I can't seem to find anything that defines this in the exports.
Any ideas are very welcome.

Take a look at
http://www.sno.phy.queensu.ca/~phil/exiftool/
it may be possible to programmatically write EXIF tags to the quicktime
files.
Hope this helps.
Nicholas Stokes
XPlatform Consulting

Audio extraction from 400+ QT files with between 1-8 channels

Hi there,
I need to extract all the audio channels individually from more than 400 QT files, with the number of channels ranging from 1 to 8.
Do you know of any script or automator action out there which could help me? Ideally I want to extract the audio as AIFF.
I've seen these, but as far as I can tell they won't extract each track/channel on its own:
http://developer.apple.com/samplecode/ExtractMovieAudioToAIFF/index.html
http://www.deepniner.net/xtract2wave/
Thanks for any help.
Macbook Pro Mac OS X (10.4.8)

I have brought my export back into PP and I don't see audio. As you suspect, my export settings were incorrect. I failed to set Channels to 8 Channel; it was set to Stereo. I am creating a new test file for the broadcaster.
Thank you!

Audition jumps time code forward on audio extracted from a .mov file

I have a video file with a starting time code of 10:41:17:23.   I open this file in Audition, apply a noise reduction effect, and export the file out as a new .wav file.
However, instead of the audio file having a starting time code of 10:41:17:23, the time code on the file starts at 10:41:34:20.
So, the time stamp has jumped forward 16 seconds, 21 frames.   And, the jump amount is not consistent. Another file had a 37 second plus jump.
Since we are talking about the STARTING time stamp, in all instances, I believe frame rate, codec, etc. are immaterial unless someone can convince me otherwise.
If I was dealing with different frame rates/codecs/etc., I could understand the time code being out of sync later in the file, but I cannot understand why it would be out of whack at the very start of the file.
Anyone got any ideas on possible cause/cure?   It's a pain having to manually search for the audio file for the correct location instead of being able to just punch in the same time code the video uses.

OK. I double checked and the timecode showed 23.976 for my test .mov file in Premier Pro, with a sample rate of 48,000 32-bit.
I made sure my timecode default was set to SMPTE 23.976 in Preferences/Time Display in Audition. I then opened the .MOV file in Audition, let it split the audio out, and moved the audio to the editor panel.
The timecode in the audition editor panel shows 23.976. The preview panel also shows 23.976.
For what is worth, the "time code" block of blue numbers in the lower bottom left of the Audition preview panel shows 00:00:00:00.    I believe this is because the primary "Time Code Start" field in the Canon 70D .mov format files is empty and the camera is putting the time code in the Alternate Time code fields.
I then opened the Effects drop down menu and applied Noise Reduction/Restoration, option Adaptive Noise Reduction, using the preset "Light Noise Reduction."
After the apply completed I saved the modified audio file with a format of Wave PCM, leaving the Sample type at the default of 48000 HZ Stereo, 32-bit and Format settings of Wave Uncompressed 32-bit floating point (IEEE).
I left the box checked next to "Include Markers and Other Metadata." And, clicked OK.
I then went to Premier Pro and opened the just saved audio file. The starting time code on the file is 10:38:52:22.   This does not match the starting time code on the original MOV file which is 10:38:14:10.
So, based on my limited understanding of both Premier Pro and Audition, I am gettting the exact same sample rate settings and time code settings all the way through the process in both Premier Pro and Audition up to the point of saving the modified file out of Audition. It is at the point of saving that things get changed.
Charles asked me to post screenshots and a sample file somewhere. I am working on doing those tonight. I just have to figure out where to post them. My original test files were quite large, but I got the same results with a 2 second file in my second test, so I'll post those files someplace.

Metadata Removal from Microsoft Word file ??

Any thoughts? I asked a few Mac Genius' and they have no idea. Cannot find anything online either.

asked a few Mac Genius' and they have no idea.
Why would they? Word is not made by Apple.
Post Word related questions on Microsoft's own forums for their Mac software, as Apple discussions only provide support for Apple products:
http://www.officeformac.com/productforums

Extract metadata from PDF

Hi all
Maybe the wrong category to post this question but I am wondering if it is possible to extract the metadata from PDF document as well as we do from images?
Since both metadata are based on XMP from Adobe it should theoretically be possible. Anybody knows how?
Thank you.
Nitai

XMP extraction from PDF format files is not implemented by interMedia.
You can learn more about XMP at the Adobe web site. It is possible to create a simple XMP extractor by implementing a byte scanner that looks for the XMP indicator string. However, I believe that the PDF format allows for some object blocks to be marked old or superseded by newer blocks. Thus it may be possible to have more than one XMP block in a PDF file. You would need to know more about PDF format to determine which is the current block.

Searchability & metadata extraction on password protected or compressed PDFs

Hello,
I am currently putting together a new document management system at my company (using http://www.alfresco.com )
and part of the functionality will involve automatic metadata extraction from stored PDFs, and google-like full-text search of PDFs.
We seem to be running into a few problems though -
All our PDFs were password protected in Acrobat to have a "read only" security permission - yet this seems to be blocking the metadata extractor.
I wasn't expecting this - obviously, Adobe Reader can still read the metadata, I'm wondering does it do anything special that other clients aren't allowed to?
Also, I'm wondering if the "object level compression" and "compress text and line art" Distiller settings could interfere with PDF content/metadata readability?

Let me expand a bit on Aandi's comments...
First, PDF document support two types of metadata. "Classic" metadata (called Document Info) which is stored in native PDF data structures and will be encrypted when the rest of the document is, so that you can NOT extract the data w/o processing the file as a native PDF document (instead of as "raw data"). "Modern" metadata is stored in a PDF in an industry standard XML grammar called XMP and will be left in "plaintext" when the rest of the document is encrypted - so that it can be located and extracted w/o the need to process the PDF natively - though it is still recommended!
Second, as Aandi points out, "compress text and line art" has been a feature of PDF since day 1, while object compression is newer feature of PDF introduced with Acrobat 5 (about 7 years ago!). NEITHER of these features will effect the "Modern" metadata, since it is required to always be in "plaintext", HOWEVEr, the latter CAN effect "Classic" metadata. Another reason you need to process a PDF natively.
Hope that helps...
Leonard

Metadata extract from PDF / Photoshop files

Similar Messages

Maybe you are looking for