How to optimize uptake of uploaded corpora?

When we upload corpora to MS Hub, segments are extracted, aligned, and filtered.  
We call "uptake" the proportion of segments that are used in the end, over the initial number of segments uploaded.
In our experience, only about 70-75 % of segments are in fact used by the system:  an uptake of 70-75%. So it seems that 25-30% of our valuable domain-specific corpora are not being leveraged at all.
Example:  A single, bilingual TMX that shows 142,791 segments in WorldServer, appears with 134,557 "extracted" in Hub, with 104,100 "aligned", and 94,730 "used".  That's an uptake of only 66% of the original TM; i.e.,
48,000+ segments were ignored!
When we extract and filter segments from the same TM outside of Hub, then review the alignments manually and upload them as .align files, we apparently get no significant improvement:  18% of the segments that we _know_ are aligned correctly are still
eliminated by Hub.
Questions:
How can we improve the uptake of corpus segments -- i.e., increase the proportion that the system actually uses?
What criteria does the system use to eliminate segments?
How can we override the system's alignment step?
Thanks!

Thanks for your reply!
Re #3:  I guess I wasn't clear in my question.  I trained the same system once with a TM, again with extracted and aligned segments from the same TM.  So, I _did_ supply perfectly aligned .align files and the system STILL eliminated 18% of
segments (the difference between extracted and aligned) from the .align files . This is why I am puzzled.
Update:  I extracted aligned segments from TMX files with a separate script; I'm not reporting numbers for the extracted sentences that Hub provides.
Re #2:  Thanks for the hints. I do see one- and two-word segments eliminated.  But most of the segments that were eliminated do not match the criteria that you mentioned. In these cases, there are no tags, the alignment is correct, the segment
is not a duplicate (I only checked for segments that were no longer present at all), and the length is fine -- but they are not in the set of aligned sentences.
For example:
The image you uploaded no longer exists.
L’immagine caricata non esiste più.
Any further information or examples that you can provide will be appreciated!

Similar Messages

  • How do I add image upload to web app edit template?

    How do I add image upload to web app edit template. When creating fields I am selecting image from the field type. But the only way to upload and image is when I create the web app item within the admin. The option to upload an image is not available when the user submit web form opens.
    Wont send any of these questions through this email anymore but really needed assistance.
    Thanks,
    Gordon

    On the Details tab of the Web App setup, under Web App Item Options; have you ticked "Allow File Upload" and specified a Default Upload Folder?

  • How do I create an 'upload facility" for my website using Muse.

    How do I create an 'upload facility" for my website using Muse.  We are a print company and is how we receive most of our work.
    Iain

    Hey Iggy,
    These might help:
    Muse
    http://tv.adobe.com/watch/introducing-business-catalyst/getting-started-with-business-cata lyst-and-muse-what-is-business-catalyst
    http://tv.adobe.com/watch/introducing-business-catalyst/getting-started-with-business-cata lyst-and-muse-creating-and-publishing-sites
    -Dave

  • How do I share files uploaded into the Creative Cloud with other creative cloud members?

    How do I share files uploaded into the Creative Cloud with other creative cloud members?

    Should be easy.  Try this...
    In Thumbnails view, click the little triangle (pointing downward) in the lower-right corner of the asset you want to share.
    In the blue icon bar that appears, click the Share icon (the third icon from the left, just right of the trash icon). The Share dialog should pop up.
    In the Share dialog, enter the email address of the person with whom you want to share the asset, then click the Send Email button - they'll receive an email with a link to your asset. OR
    You can also copy a link to the asset and then paste that into your own email client if you prefer.  To do that, click the Link icon (looks like a "chain", and is to the right of the email "envelope" icon) - then click the Copy Link button.
    Note that the Share options won't be available if your asset is set to "Private" - you can control whether an asset can be viewed (or downloaded) by others by clicking the Public/Private control (green or red "lock" icon).
    You can also access the same Share controls if you click on the file to see it one-up (you can do this from either Thumbnails view or List view); click the Share icon near the upper right corner of the browser window (to the right of the asset name).
    Hope that helps.

  • How can i do the upload file function using tomcat library??

    how can i do the upload file function using tomcat library??

    Did you read the document for the library?
    If you can't figure it out, why don't you ask the people who provide the library?
    This has nothing to do with JavaMail.

  • How to check a pdf uploaded for press in a website automatically ??

    how to check a pdf uploaded for press in a website automatically ??
    i am making a new website for a printer.. his client upload pdf online directly in his website, we want that in the case that the pdf is not
    as the printer need it for printing , the site automaticly after checking the pdf profile uploaded open a window and write what is wrong with this pdf
    and if possible fix what he can fix automaticly as pitstop software is doing offline.
    PLEASE YOUR HELP
    thank you in advance

    Acrobat isn't available with a server license. You might like to look into PitStop Server.

  • How do I make an uploaded document like an order form work in pages and how do I save it to a bookmark

    How do I make an uploaded document like an order form work in  pages and how do i bookmark it for later use

    Pages can work with Word documents. It can save the results as either PFd or Word documents.
    Allan

  • In dreamweaver mx 2004 and dreamweaver cs4, how I configure, when download/upload do not ask me to include DEPENDED FILES but act without including ?

    in dreamweaver mx 2004 and dreamweaver cs4, how I configure, when download/upload do not ask me to include DEPENDED FILES but act without including ?

    Open the Preferences panel (Edit menu on Windows, Dreamweaver menu on a Mac), and select the Site category. There are two checkboxes there for dependent files. Make sure both are selected. The Dreamweaver default is NOT to upload/download dependent files. You need to click Yes, if you want the dependent files to be included.

  • How do i email or upload my entire itunes library so i can put it on a diff

    how do i email or upload my entire itunes library so i can put it on my laptop!
    im getin ready 4 a party and realised i need music to be moved on2 my laptop!!
    iv tries to email it but file attatching only allows one song a time and id b years doin dat!
    any ideas! asap
    thanx holly

    Holly:
    I'm assuming that you have your iTunes library on a desktop machine and want to move your songs to your laptop. You have a couple of choices. If your library is small, you can create a backup copy onto a Data CD then copy the CD's contents onto your laptop.
    1) Go to Preferences, select the Burning tab, click on the Data CD option.
    2) Create a new playlist and copy the contents of the library to this new playlist.
    3) Right-click on the new playlist and select Burn Playlist to disc
    4) Launch iTunes on your laptop
    4) Once the CDs are burned on your desktop, take them over to the laptop and copy them into your iTunes Music folder on the laptop
    5). Launch iTunes on your laptop and drag the tracks from the iTunes Music folder onto iTunes.
    If your library is not so small (ie, it won't fit on just a couple of data CDs)
    If your ipod is less than half full, you can use it to quickly transfer the songs:
    On your desktop:
    1) Launch iTunes and go to Edit ... Preferences. Leave the window open
    2) Plug in your iPod to the desktop machine
    3) Once its active on your desktop, open My Computer and you will see your iPod mounted as a "Removable Disk". Open this icon up and you will see several folders such as "Calendars", "Contacts", etc. Right-click within this window and select New ... folder to create a new folder on your iPod
    4) Open that new folder up
    5) Go to your iTunes Music folder and copy its contents into the new folder you created above.
    6) Once completed, go to iTunes and cancel out of the Preferences window. Your iPod will update and then will dismount. (If you've changed your setting to keep the iPod mounted at all times, then you will need to right-click on your iPod entry to eject it.)
    Go over to your laptop and do the following:
    1) Launch iTunes and go to Edit ... Preferences. Leave the window open
    2) Plug in your iPod to the laptop
    3) Once its active on your desktop, open My Computer and you will see your iPod mounted as a "Removable Disk". Open this icon up and you will see the new folder you just created above
    4) Open that new folder up and copy all your tracks to the iTunes Music folder on your laptop
    5) Cancel out of the preferences window. Your iPod should unmount.
    6) With iTunes open on your laptop, click on Library.
    7) Go to the folder on your laptop that contains all your songs, select all your tracks and drag them into iTunes.

  • How do I delete my uploaded files? I am on an Ipad.

    How do i delete my uploaded files? There is no trash button

    you must run the official iOS5 version, no developer version which likely is a beta . Use itunes 10.5 , connect your device, let it make the backups and then hit the "restore" button in itunes. This will download and install the one and only supported OS for your device.
    Once you have this on it, you can install all apps, including garageband.

  • How to optimize xquery expression ?

    hi,
    i got berkeley db xml database with containers: dicom.dbxml and instancemetadata.dbxml.
    dicom.dbxml contains documents as follow:
    <?xml version="1.0" encoding="UTF-8"?>
    <instance docid="dicom_1009">
         <dicom_item>
              <dicom_header>
                   <dicom_tag group="0002" element="0000" vr="UL">194</dicom_tag>
                   <dicom_tag group="0002" element="0001" vr="OB"/>
                   <dicom_tag group="0002" element="0002" vr="UI">1.2.840.10008.5.1.4.1.1.2</dicom_tag>
                   <dicom_tag group="0002" element="0003" vr="UI">2.16.840.1.113662.2.1.4519.41582.4105152.419990505.410523251</dicom_tag>
                   <dicom_tag group="0002" element="0010" vr="UI">1.2.840.10008.1.2.1</dicom_tag>
                   <dicom_tag group="0002" element="0012" vr="UI">2.16.840.1.113662.2.1.1</dicom_tag>
                   <dicom_tag group="0002" element="0016" vr="AE">PHOENIXSCP</dicom_tag>
              </dicom_header>
              <dicom_body>
                   <dicom_tag group="0008" element="0000" vr="UL">596</dicom_tag>
                   <dicom_tag group="0008" element="0005" vr="CS">ISO_IR 100</dicom_tag>
                   <dicom_tag group="0008" element="0008" vr="CS">ORIGINAL\PRIMARY\AXIAL</dicom_tag>
                   <dicom_tag group="0008" element="0012" vr="DA">1999.05.05</dicom_tag>
                   <dicom_tag group="0008" element="0013" vr="TM">10:52:34.530000</dicom_tag>
                   <dicom_tag group="0008" element="0016" vr="UI">1.2.840.10008.5.1.4.1.1.2</dicom_tag>
                   <dicom_tag group="0008" element="0018" vr="UI">2.16.840.1.113662.2.1.4519.41582.4105152.419990505.410523251</dicom_tag>
                   <dicom_tag group="0008" element="0020" vr="DA">1999.05.05</dicom_tag>
                   <dicom_tag group="0008" element="0021" vr="DA">1999.05.05</dicom_tag>
                   <dicom_tag group="0008" element="0022" vr="DA">1999.05.05</dicom_tag>
                   <dicom_tag group="0008" element="0023" vr="DA">1999.05.05</dicom_tag>
                   <dicom_tag group="0008" element="0030" vr="TM">10:52:34.530000</dicom_tag>
                   <dicom_tag group="0008" element="0031" vr="TM">10:52:34.530000</dicom_tag>
                   <dicom_tag group="0008" element="0032" vr="TM">10:52:34.530000</dicom_tag>
                   <dicom_tag group="0008" element="0033" vr="TM">10:52:32.510000</dicom_tag>
                   <dicom_tag group="0008" element="0060" vr="CS">CTTR</dicom_tag>
              </dicom_body>
         </dicom_item>
    </instance>
    instancemetadata.dbxml contains documents as follow:
    <?xml version="1.0" encoding="UTF-8"?>
    <instancemetadata xmlns="imuba.med" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="imuba.med Instancemetadata.xsd">
              <name/>
              <notes/>
              <id>instancemetadata_1</id>
              <instanceid>dicom_1</instanceid>
              <createusername>dd</createusername>
              <createdate>Tue May 02 21:08:06 CEST 2006</createdate>
              <lastmodusername>dd</lastmodusername>
              <lastmoddate>Tue May 02 21:08:06 CEST 2006</lastmoddate>
         </instancemetadata>
    and i got XQuery expression:
    declare namespace n = "imuba.med";
    declare variable $insCont external;
    for $ins in collection(concat(concat("dbxml:containers/", string($insCont)),".dbxml"))/instance,
         $met in collection("dbxml:containers/instancemetadata.dbxml")/n:instancemetadata
    where
    $ins/dicom_item/dicom_body/dicom_tag[@group='0008' and @element='0060'] = "CTTR" and
    $ins/@docid = $met/n:instanceid
    return
    <row>
    { $ins/@docid }
    { $met/n:name }
    { $met/n:notes }
    { $met/n:id }
    { $met/n:instanceid }
         { $met/n:createusername }
    { $met/n:createdate }
    { $met/n:lastmodusername }
    { $met/n:lastmoddate }
    </row>
    while i got 5000 documents in dicom container, the xquery execution time is close to 10 secs. i've tried to create indices using commands:
                        XmlIndexSpecification is = xcDicom.getIndexSpecification();
                        is.addIndex("", "docid", "unique-node-attribute-equality-string");
    and
                        XmlIndexSpecification iss = xcIns.getIndexSpecification();
                        iss.addIndex("imuba.med", "instanceid", "unique-node-element-equality-string");
    And then the execution time is nearly about 7-8 sec, but it's still big (the database contains only 5000 documents).
    Have you any idea how to optimize it ? I suppose the index on element i'm using in the WHERE clause would be helpful (dicom_item/dicom_body/dicom_tag[@group='0008' and @element='0060']). Well, i haven't found concept how to add index on element which can be shown using xpath expression.
    thanks for any help
    Darek

    Hi Darek,
    First off, why not try adding these indexes to see what happens:
    is.addIndex("", "dicom_tag", "node-element-equality-string");
    is.addIndex("", "group", "node-attribute-equality-string");
    is.addIndex("", "element", "node-attribute-equality-string");
    Secondly, what storage model are you using? I would expect you to get better query times using a NodeContainer, with the DBXML_INDEX_NODES flag enabled.
    Thirdly, your "instance" document is not very "XML" like, so you will struggle to get very good query times using that format. If you have control over the format of the document, I would suggest incorporating one or more of the "group", "element", and "vr" attributes into the name of the element - so that you will get multiple elements with different names, instead of one element name with multiple permutations of attributes. Selecting an element by name will always be faster than selecting it by some kind of value.
    Let me know how you get on with these suggestions,
    John

  • How can I make MUSE upload one png that it overlooks? Cache cleared & shows in Preview

    MUSE uploads  everything to BC except an inport png. I cleared the cache, it shows inPreview, no conflicts noted in Assets and the Browser reflects any other changes I make to the page.  The space for the image is missing online and the entire page moves up to fill the space

    Thanks for getting back. I had a deadline so I gave-up trying to make that (great) extracted image work and deleted it from my system -frustration. For what its worth, MUSE displayed the png in Design/ Preview/ Plan &  provided an empty space for it when published.  Besides this quirk, MUSE has been a great.
    Date: Mon, 27 Aug 2012 10:49:07 -0600
    From: [email protected]
    To: [email protected]
    Subject: How can I make MUSE upload one png that it overlooks? Cache cleared &amp; shows in Preview
        Re: How can I make MUSE upload one png that it overlooks? Cache cleared & shows in Preview
        created by adobelance in Help with using Adobe Muse - View the full discussion
    Could you share your URL so we can take a look at your site?
         Replies to this message go to everyone subscribed to this thread, not directly to the person who posted the message. To post a reply, either reply to this email or visit the message page: http://forums.adobe.com/message/4651358#4651358
         To unsubscribe from this thread, please visit the message page at http://forums.adobe.com/message/4651358#4651358. In the Actions box on the right, click the Stop Email Notifications link.
         Start a new discussion in Help with using Adobe Muse by email or at Adobe Forums
      For more information about maintaining your forum email notifications please go to http://forums.adobe.com/message/2936746#2936746.

  • How can i do to upload my movies from the ipad to the i cloud?

    How can i do to upload my movies from the ipad to the i cloud?

    If these are videos taken with your iPad, you can only add them to iCloud by adding them to a shared stream, as discussed here: http://help.apple.com/icloud/#/mmc0cd7e99.
    If these were movies purchased from the iTunes store, they are already in iCloud and can be redownloaded again in many countries for free as long as the movie is still available and the studio allows it.  See http://support.apple.com/kb/HT2519 and iTunes in the cloud availability by country here: http://support.apple.com/kb/HT5085.
    To be safe, import them to your iTunes library so you know you have them.  To do this, connect your iPad to your computer, open iTuens and go to File>Devices>Transfer Purchases.  Then you can sync them back to your iPad later if you ever need to.

  • How can I save and upload my resume from my iPad

    How can I save and upload my resume from a iPad?

    what program are you using to create word processing documents?  Also, you can try dropbox https://itunes.apple.com/us/app/dropbox/id327630330?mt=8

  • How can I view and upload videos to shared photos streams on mac

    I use shared photo streams alot how can I view videos uploaded to a shared photo stream on my mac

    You can not at this time - tha tis only an IOS 7 and PC feature today - mayber with future software releases
    http://support.apple.com/kb/TS4379?viewlocale=en_US&locale=en_US
    What software versions do I need to be able view and share videos, and contribute to other people's shared streams?
    iPhone, iPad, or iPod touch with iOS 7.0 or later
    PC with Windows 7 or later and the iCloud Control Panel 3.0 or later
    AppleTV (second generation) with Apple TV Software Update 6.0 or later
    If the owner of the shared stream has turned on the Public Website option and shares the link with you, you will be able to view videos and see photos and video added by all contributors from any up-to-date browser.
    LN

Maybe you are looking for