Optimisation of a Name Matching Routine

Good morning,
I'm developing an SSIS package to do the name matching of pupils that I have stored in one table. All the pupils stored on this table have been dumped from 5 different databases.
The package dataflow is:
1. Truncate the destination table in which I'll store the Id of the source pupil, the Id of the candidate pupil and the matching percentage.
2. After empty the destination table, I split the execution flow to extract in 5 different "Recordset Destination" the Id's of the pupils by database.
3. With the id's stored on each "Recordset Destination" I use them in each Loop to extract the pupil information and compare its data with the rest of the pupils.
My package is:
The process done inside each loop is the following:
1. I extract all the pupils that his Double Metaphone key match.
2. With all the pupils extracted by the OLE DB Source, I compare their data and I calculate the matching percentage inside the "Script Component".
3. Finally, if the matching percentage is bigger than the threshold defined in a SSIS variable, I store the data on the Matching table.
Here is the dataflow:
The thing that I want to ask is if you have any idea of how optimise the package because I have to compare 15 million of pupils and I've estimated that the process will finish in more than 1 week!!!
I'm sure that I'm doing something wrong but I've tried different data flows and transformations and the time is the same.
If you need more information to suggest me any change that I can do, just tell me.
Thank you!

Thank you for your comments ArthurZ.
I'm matching names with the Double Methaphone Algorithm + N-Gram Algorithm. When I dump the data from the source databases to the mirror databases (DB1, DB2...), I calculate the keys of each algorithm and I store them with the pupil information (Name, Surname,
DOB, Gender, Postcode...).
With this information stored, I extract it with the package that you see on the pictures. After extract the information of the pupil, I extract on the Loops the pupils with the same Double Metaphone keys and I calculate the matching percentage.
In the first Data Flow Tasks (Pupil To Compare - Id Extraction), I extract the source pupils to compare and I store their Ids in a "Recordset Destination". After that, on each loop, I use this recorset destination to compare the source pupil with the
rest of the candidate pupils.
I've studied to use the Fuzzy Lookup or the Group Matching but it's not useful for my case because I don't need to group and clean the data.
I think that I'm losing performance in the use of the recordset on the loops. Any suggestion in which I can skip the use of the loops?
Many thanks!

Similar Messages

  • Why don't my iphoto event names match up to my iphoto master file names?

    Yosemite 10.10
    iPhoto 9.6
    MacBook Pro 2009
    I recently downloaded Yosemite, which has slowed my laptop down significantly. This has prompted me to transfer all large files - photos, movies and music off my laptop and onto an external hard drive (in addition to my time machine backup hard drive).
    I was instructed (at the Apple Store) to open Finder then Pictures > iPhoto Library. Then to right click on the iPhoto Library folder and select 'show package contents', this reveals a large list of folders but I understand that the Masters folder is the one to care about with respect to back ups. When I open up Masters I can see 8 folders labelled 2007 - 2014. In each of these folders are other folders housing all the jpeg files.
    THE PROBLEM:
    For some reason in the 2007-2009 folders the Master file names match up to the Event names in iPhoto. This is great and makes sense! However, after 2009 everything gets really messed up in the Masters folder - despite my iPhoto Library being meticulously organised into Events that are all labelled. For example, when I open up the 2014 folder the path looks like this 2014 > 10 > 06 > 20141006-214510 > which then houses the jpeg images. It looks to me like these folders represent upload events as opposed to the Events that I organised them into in the front end of iPhoto. WHY!?!
    THE QUESTIONS:
    Why is this happening? And can I fix it so that it doesn't happen when I next import photos into iPhoto?
    Should I just stop using iPhoto?
    What happens if I change the names of the Master folders in the back end of iPhoto to reflect the Event names in the front end? Would it be better to just organise them on my hard drive?
    Thank you

    1. It's not a problem as you never ever access your photos in this way.
    2. If you want to back up your original photos you do this via the Export function: File -> Export
    This User Tip
    https://discussions.apple.com/docs/DOC-4921
    has details of the options in the Export dialogue.
    3. None of this saves space on your HD. To do that you need to move your Library from the HD to the external:
    Make sure the drive is formatted Mac OS Extended (Journaled)
    a. Quit iPhoto
    b. Copy the iPhoto Library from your Pictures Folder to the External Disk.
    c. Hold down the option (or alt) key while launching iPhoto. From the resulting menu select 'Choose Library' and navigate to the new location. From that point on this will be the default location of your library.
    d. Test the library and when you're sure all is well, trash the one on your internal HD to free up space.
    4. That particular Apple Genius isn't.
    As to the specific questions:
    Why is this happening? And can I fix it so that it doesn't happen when I next import photos into iPhoto?
    It's happening as that is how iPhoto works and there is nothing ti fix. Exporting makes the issue redundant.
    Should I just stop using iPhoto?
    Why?
    What happens if I change the names of the Master folders in the back end of iPhoto to reflect the Event names in the front end?
    You'll corrupt the Library. And it's unnecessary.
    Would it be better to just organise them on my hard drive?
    No.

  • "Service Name" and "Routine Name" in tmadmin

    In tmadmin console, there is "Service Name" and "Routine Name". What is the differences between them? Do they have relation with "Server Name"? Thx.

    See http://download.oracle.com/docs/cd/E13161_01/tuxedo/docs10gr3/rfcm/rfcmd.html#wp1750061
    In 'buildserver' when you use the "-s" option you can specify the service name and optionally the routine/function name. The function/routine name is compiled and included in the server executable that 'buildserver' creates.
    Typically the service name and routine name are the same.
    Harvey

  • How to make filename & version name match!

    Is there no way to make file name match the version name after importing......Greg
    Message was edited by: gnpToday

    up,
    this feature could save my life

  • Display only exact name matches

    CS6 / OS 10.8.2
    I have my media folder on an external firewire drive.
    When I check Display only exact name matches, all other files do not get greyed out, as they should.

    Same here. I've found that using "page up" / "page down" flushes the bug.

  • When will VersMail be able to SAN name match?!?!

    I am sure I am not alone here in my problem.
    My company has an address to access and sync with Outlook for mobile users that is mobile.companyname.com. The problem is that the SSL has an alternate name list and mobile.companyname.com is way down on the list. Apparently VersaMail only checks the first name on the list to see if it matches the server you entered. When I try to sync VersaMail sees that the first entry doesn't equal mobile.companyname.com and spits back a certificate error. I contacted IT at my job and they only support company issued Blackberrys so I am SOL.
    When will VersaMail support name matching so that I can once again sync with my work email????
    Post relates to: Centro (Sprint)

    I have encountered the same problem. My company went to Exchange Server 2007 and Outlook 2007 and when I could no longer get e-mail on my phone, this was the culprit. I use Chatteremail but the issue apparently is Palm OS. Grrr
    "One of the limitations we've uncovered is the Palm OS's
    inability to handle SAN (Subject Alternative Name) certificates. Rather than
    having multiple front end servers and many different SSL certificates for
    the services that Exchange 2007 provides, we chose to configure the system
    with one SAN SSL certificate because it simplified connectivity to the services for users, kept the architecture simpler and cost less."
    Post relates to: Treo 755p (Verizon)
    Post relates to: Treo 755p (Verizon)

  • Select flat file name using routine

    Hi experts!
    I am trying to write a routine in the infopackage for flat file extraction, which will select the flat file automatically according to the date. I need to load always the file of the previous week. Please help me correcting the code. The file name is: DatAuftragsbestandSeiten_W(number of week).fix
    For example: DatAuftragsbestandSeiten_W16.fix
    Thank you for your help!
    program filename_routine.
    Global code
    $$ begin of global - insert your declaration only below this line  -
    Enter here global variables and type declarations
    as well as additional form routines, which you may call from the
    main routine COMPUTE_FLAT_FILE_FILENAME below
    *TABLES: ...
    DATA:   ...
    DATA: Str1 value '/strans/appl/anzeigen_bw/DatAuftragsbestandSeiten_W',
    Str3 value '.fix'.
    DATA: iweek(2).
    call function 'WEEKNR_GET'
      EXPORTING
        DATE         = sy-datum
      IMPORTING
        WEEK+4(2)    = iweek.
    iweek = iweek - 1.
    $$ end of global - insert your declaration only before this line   -
    form compute_flat_file_filename
      using    p_infopackage  type rslogdpid
               p_datasource   type rsoltpsourcer
               p_logsys       type rsslogsys
      changing p_filename     type RSFILENM
               p_subrc        like sy-subrc.
    $$ begin of routine - insert your code only below this line        -
    This routine will be called by the adapter,
    when the infopackage is executed.
      p_filename =
    *....Concatenate str1 iweek str3 into p_filename.
      p_subrc = 0.
    $$ end of routine - insert your code only before this line         -
    endform.

    hi Doris,
    try
    in global routine
    data : l_week type SCAL-WEEK,
           i_week(2).
    in form compute_...
    data : Str1 type string,
           Str3 type string.
           str1 = '/strans/appl/anzeigen_bw/DatAuftragsbestandSeiten_W'.
           str3 = '.fix'.
           call function 'DATE_GET_WEEK'
              exporting
                 date = sy-datum
              importing
                 week = l_week.
              i_week = l_week+4(2).
              i_week = i_week - 1.
              if strlen( i_week ) = 1.
                 concatenate str1 '0' i_week str3 into p_filename.
              else.
                 concatenate str1 i_week str3 into p_filename.
              endif.
    hope this helps.

  • E50 number to contact name matching bug?

    Hello,
    I just got an E50 (RM-170, software V 07.36.00). 
    It seems to have a very irritating bug in the way numbers are matched to contacts in the 'recent calls' lists:
    - an entry in the list will show the contact name *only if* the number appears only once in the contact list;
    - otherwise the entry only shows the number. 
    This happens all the time with my well organized (synced ith my PC) contact list where 2 members of the same family typically have entries with different mobile numbers, but same landline numbers. In that case the landline number will never be matched to a contact name.
    To me this is not a 'feature', but definitely a bug. All other (non-Nokia) phones I have owned have the much more usable behaviour of showing in the calls list the name for the first (or last - who cares) contact that matches a call number.
    Is there a  software update that fies this bug for the E50?
    -- FL 

    cjlim wrote:
     What you are supposed to do is to save all the different number for a contact under the same contact name eg. Joe Soap - Home, Joe soap - mobile etc.
     Well this is exactly what I do: I have
     - Joe Soap
        - Home (xxxxxxxxx)
        - Mobile (yyyyyyyyy)
    But I have also:
     - Jane Soap
        - Home (xxxxxxxxx)
        - Mobile (zzzzzzzzz)
     And this where the Nokia bug hits:
    When Joe or Jane Soap call me from their home number xxxxxxxxx (which is the same for both contacts):
    - any other (sensible)  phone would match xxxxxxxxx to a contact (Joe or Jane Soap - I don't really care which Soap);
    - but the Nokia insists on showing the bare number xxxxxxxxx, without matching it to a contact.

  • CAD SQL Instance Name match

    Looking for clarification on CAD installation with SQL Data Store. The CAD Installation Guide states "It is required that your system include a separate SQL Server instance that hosts the CAD base services (on both servers in a replicated system). This is under the Configuring Microsoft SQL Server 2005 for CAD section.
    Do the SQL instance name for the side A and B servers need to match ? I believe the answer is yes but want to make sure.

    Yes, use the same name.  A few steps down there is the statement:
    "NOTE: It is recommended that you name the SQL instance that CAD will use CADSQL. Although, you can name this SQL instance anything you would like, the directions for this guide are written for use with CADSQL. You might have to deviate from these instructions if you do not use the name CADSQL."

  • Find objects associated with Technical names of routines

    Is there an easy way to find what objects, TR/UR, are associated with the cryptic technical names given in a transport list of Routines and Update Rules?
    e.g.:
    -Routine                     
       |_0BCJXNMRQHE29898UDAQN1TOY
    -Update rules                
       |_20ZUEIGNKD42XNU4HPWJEZ75H
    Thanks!

    Hi Jolene,
    You can try to look up tables RSUPDROUT and RSUPDINFO.
    Hope this helps...

  • XML node name matching with regular expressions

    Hello,
    If i have an xml file that has the following:
         <parameter>
              <name>M2-WIDTH</name>
              <value column="09" date="2004-10-31T19:56:30" row="03" waferID="PUK444150-20">10.4518</value>
         </parameter>
         <parameter>
              <name>M2-GAP</name>
              <value column="29" date="2004-10-31T19:56:30" row="06" waferID="PUK444150-03">2.864</value>
         </parameter>
         <parameter>
              <name>RES-LENGTH</name>
              <value column="29" date="2004-10-31T19:56:30" row="06" waferID="PUK444150-03">2.864</value>
         </parameter>
    Is there anyway i can get a list of nodes that match a certain pattern say where name=M2* ?
    I cant seem to find any information where i can match a regular expression. I see how you can do:
    String expression=/parameter[@name='M2-LENG']/value/text()";
    NodeList nodes = (NodeList) xPath.evaluate(expression, inputSource, XPathConstants.NODESET);
    But i want to be able to say:
    String expression=/parameter[@name='M2-*']/value/text()";
    Is this possible? if so how can i do this?
    Thanks!

    As implemented in Java, XPath does not support regular expressions, but in most cases there are workarounds thanks to XPath functions. Correct me if I'm wrong, but setting your expression against the XML document (i.e. because there are no "name" attributes in the whole document) I think you mean to get the value of the <value> elements that have a <parameter> parent element and a <name> sibling element whose value starts with "M2-". If that is the case, you can use the following query expression:String expression = "//parameter/value[substring(../name,1,3)='M2-']";Sorry if I misunderstood the meaning of your expression, but I hope this will help you get the hang of using XPath functions as a substitute for regular expressions.

  • 8830 phone book number to name matching

    I have a BB 8830 World Edition. My service provider is Reliance.
    The phone numbers stored in the phone book start with (+91) National Code for India and then the 10 digit mobile number or 8 digit mobile number.
    Its the same way on my Nokia E51 and iPhone. When I get a call, the name is matched with the number correctly on the Nokia and iPhone. While on my BB, the name is not matched.
    I found that only when the mobile number starts with 0 and then the 10 digit mobile number is the name correctly mapped on the BB. Why does the standard mapping on all other phones does not work with BB? The standard National Code + Phone number mapping used to work with my previous Moto Razr too. Why does it not work with BB?

    I'm not sure the "why" is really important.
    Set your County code at Phone (press the green dial key) > Options > Smart Dialing. Enter you country code there and national number length and remove your leading country code from the phone book entries.
    1. If any post helps you please click the below the post(s) that helped you.
    2. Please resolve your thread by marking the post "Solution?" which solved it for you!
    3. Install free BlackBerry Protect today for backups of contacts and data.
    4. Guide to Unlocking your BlackBerry & Unlock Codes
    Join our BBM Channels (Beta)
    BlackBerry Support Forums Channel
    PIN: C0001B7B4   Display/Scan Bar Code
    Knowledge Base Updates
    PIN: C0005A9AA   Display/Scan Bar Code

  • Making XSD element name match the column name/header

    The XML format of the answer I created looks like the following. How can I change the element name from C0, C1... to real column name? http://host:port/analytics/saw.dll?Go&searchid provided the XML
    <?xml version="1.0" encoding="utf-8" ?>
    - <RS xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
    - <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
    - <xsd:complexType name="R">
    - <xsd:sequence>
    <xsd:element name="C0" type="xsd:double" minOccurs="0" maxOccurs="1" saw-sql:type="double" saw-sql:displayFormula=""CUSTOMERS"."SALES"" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:tableHeading="CUSTOMERS" saw-sql:columnHeading="SALES" />
    <xsd:element name="C1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:displayFormula=""CUSTOMERS"."CITY"" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:tableHeading="CUSTOMERS" saw-sql:columnHeading="CITY" />
    </xsd:sequence>
    </xsd:complexType>
    </xsd:schema>
    - <R>
    *<C0>0.3</C0>*
    *<C1>WILLITS</C1>*
    </R>
    Edited by: user732932 on Jan 7, 2010 11:38 AM

    OR is there a way to pass the column headers from OBIEE to a URL? For example, session parameters can be passed using @{parmName}

  • Select File name using routine

    I am trying to write a routine in the info package for flat file extractionwhich will select the flat file automatically according to sy-datum ( the flat file is saved with system date). Here is my example code. Please help me in writing a correct code.
    program filename_routine.
    Global code
    $$ begin of global - insert your declaration only below this line  -
    TABLES:
    DATA: str1(60) TYPE c,
          str2(10) TYPE c,
          str3(10) TYPE c.
         str4(100) type c.
    str1 = 'C:\Documents and Settings\Yadavalli\Desktop'.
    str2 = 'Sy-datum'.
    str3 = 'csv'.
    $$ end of global - insert your declaration only before this line   -
    form compute_flat_file_filename
      changing p_filename type RSFILENM
               p_subrc like sy-subrc.
    $$ begin of routine - insert your code only below this line        -
             concatenate str1 str2 str3 into str4.
      p_filename =
    CONCATENATE str1 '\' str2 '.' str3 INTO str4.
      p_subrc = 0.
    $$ end of routine - insert your code only before this line         -
    endform.
    Regards
    Naga

    I thought of using below Function Module but it is not loading the data. It is showing red. Please let me know whether i have mentioned logical_filename correctly.
    $$ begin of routine - insert your code only below this line        -
    CALL FUNCTION 'FILE_GET_NAME'
      EXPORTING
      CLIENT                        = SY-MANDT
        logical_filename              = 'C:\'
      OPERATING_SYSTEM              = SY-OPSYS
      PARAMETER_1                   = ' '
      PARAMETER_2                   = ' '
      PARAMETER_3                   = ' '
      USE_PRESENTATION_SERVER       = ' '
      WITH_FILE_EXTENSION           = ' '
      USE_BUFFER                    = ' '
      ELEMINATE_BLANKS              = 'X'
    IMPORTING
      EMERGENCY_FLAG                =
       FILE_FORMAT                   =  'csv'
       FILE_NAME                     =  sy-datum
    EXCEPTIONS
      FILE_NOT_FOUND                = 1
      OTHERS                        = 2
    IF sy-subrc <> 0.
    MESSAGE ID SY-MSGID TYPE SY-MSGTY NUMBER SY-MSGNO
            WITH SY-MSGV1 SY-MSGV2 SY-MSGV3 SY-MSGV4.
    ENDIF.
    Regards
    Naga

  • Rename cluster resource - why doesn't the name match?

    I created a new NSS pool/volume that was clustered as per the docs
    However, the actual resource object name gets auto-created and you cannot rename it?

    Kevin,
    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.
    Has your problem been resolved? If not, you might try one of the following options:
    - Visit http://support.novell.com and search the knowledgebase and/or check all
    the other self support options and support programs available.
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.novell.com)
    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://forums.novell.com/faq.php
    If this is a reply to a duplicate posting, please ignore and accept our apologies
    and rest assured we will issue a stern reprimand to our posting bot.
    Good luck!
    Your Novell Product Support Forums Team
    http://forums.novell.com/

Maybe you are looking for

  • I can't install Oracle8.1.7 on Pentium4,Help me please

    i can't install Oracle 8.1.7 Enterprise Edition on Pentium4- 1.3G , I use W2k Server. Have any idea to help me? I've heard that there is a patch to solve this problem,but i can't find it. Help me please.

  • Confusion about Nokia specs pages....

    I will be getting a 920... but am saddened to hear Nokia Canada only showing black as a colour choice.  So I started looking around to other places I could buy the same phone and found that different sites show different frequencies supported.  Is th

  • Syncing ipod with 2 i tunes accounts

    Newby here, set up an itunes accounts ages ago, forgot username and password so set up a second, put credit on, purchased some songs and tried to sync but would not let me as itunes recognised my ipod as different name, i.e library. Only thing I coul

  • How to set itunes not to save genre files' artwork automatically

    i just wonder if there is a way to prevent itunes to not to save genre artworks automatically from the pre-installed file? thanx

  • Problems with my Ipod Nano 4g, Please somebody help me :-(

    I decided to buy my first ipod yesterday and I put all my songs on it and so on... But many songs don't work, I want to play them and the sign changes to that play sign but the song stands still. I can play forward in the song but it's still not play