Parsing text for keywords

Hi there, i'm about to start a project where i analyse text and pick out all of the meaningful words, throwing away the determiners (such as "the", "a", "an" etc...).
I realise that i could compare my String to a whole list of these words, but i was wondering if anybody had any suggestions before i start. Is there a package that would save me the trouble of writing out a long list of words, or even any sample code. I'm googling as we speak but just thought i would ask here as well.
Thanks in advance
oookiezooo

yeah, i already have alot of knowledge of parsing text, but frankly the idea of writing a HUGE list of words i don't want to include doesn't appeal to me, thats why i asked, thanks for trying though!

Similar Messages

  • Search text for keywords - innodb table

    I have a longtext column in a table that I need to search
    through for keywords. The table is in innodb format. I dont want to
    change it to myisam because I can't afford to have it lock at the
    table level... I prefer the row level of innodb.
    How can I build a search around this? It would be nice to
    have all words, any words, and exact phrase as an option, as seen
    in tom mucks extension (which i own)... however, this recordset has
    so many arrays, it's completely hand coded and they extension isn't
    suppose to work with anything but default recordsets.
    Any suggestions? How can I have a more comprehensive search
    using a innodb table?

    tom mucks extension lets you use keywords 3 ways... all
    words, any words, and exact phrase.
    http://www.tom-muck.com/extensions/help/DynamicSearchPHP/

  • Best Method For Keyword Search (Full Text Search)

    I have some cataloged columns I am searching for Keywords.
    This table is getting huge over 6.7 million records and it is
    becoming slow. What is the best method to optimize the DB for this
    Search.
    Do I need to create a column that
    will have a keyword associated to a description of each record
    or.... and search this particular column for the Records?. My
    clients records at time are store like "Bolt, Flange" and some are
    stored as "Bolt, Flange 1/4inch....." . Any one with any idea of
    the best methodology of getting this Keyword Search Optimized and
    returning faster query results?
    Thanks

    Consider creating a Verity collection on the appropriate
    columns in your database. The frequency of database update will
    help in determining if this is the appropriate thing to do.

  • Firefox is having a problem with Linkedin. I get the error, error in parsing value for "backgroun" and for "filter"

    When trying to send messages in linkedin, firefox takes me back to the log in screen without sending the message. On the error console it shows the error; error in parsing value for "background" and error in parsing value for "filter". What is causing this?

    Pages does not support the Apple font used for color emoji, so that behavior is normal.
    With what app are you reading the yahoo mail?  There is really no guarantee than any other email service will show the special Apple font involved.
    You should have no problem putting emoji directly into Mail or Text edit via drag drop from the Character Viewer as shown below.
    You should also be able to upload graphics here easily by clicking on the camera icon.  My email is tom at bluesky dot org.

  • I have ca. 30 pdf documents I need to search for keywords; how can I do on my MAC?

    I have ca. 30 pdf documents I need to search for keywords; when I open these documents in Adobe Reader on my MAC, it shows a Search tool; however, when I search for keywords I know are in the document, none are found.  How can I do a keyword search?

    Do you know if the text has been OCR recognised? Are the original documents "scans"?
    An easy way to find out, if you can select an individual word or letter? If you are selecting a whole block of text then the document will need to be put through Optical Character Recognition (OCR) software first to enable you to keyword search.

  • Problem for using oracle xml parser v2 for 8.1.7

    My first posting was messed up. This is re-posting the same question.
    Problem for using oracle xml parser v2 for 8.1.7
    I have a sylesheet with
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">.
    It works fine if I refer this xsl file in xml file as follows:
    <?xml-stylesheet type="text/xsl" href="http://...../GN.xsl"?>.
    When I use this xsl in pl/sql package, I got
    ORA-20100: Error occurred while processing: XSL-1009: Attribute 'xsl:version' not found in 'xsl:stylesheet'.
    After I changed name space definition to
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> in xsl file, I got
    ORA-20100: Error occurred while processing: XSL-1019: Expected ']' instead of '$'.
    I am using xml parser v2 for 8.1.7
    Can anyone explain why it happens? What is the solution?
    Yi

    <BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Steven Muench ([email protected]):
    Element's dont have text content, they [b]contain text node children.
    So instead of trying to setNodeValue() on the element, construct a Text node and use the appendChild method on the element to append the text node as a child of the element.<HR></BLOCKQUOTE>
    Steve,
    We are also creating an XML DOM from java and are having trouble getting the tags created as we want. When we use XMLText it creates the tag as <tagName/>value rather than <tagName>value</tagName>. We want separate open and close tags. Any ideas?
    Lori

  • Searching files for keywords

    Hi all,
    I am in the process of building a shell script as part of a auditing utility. It will search a specified directory for keywords and output results of the file path, and line number that the word was found on. I built a test script (shown below) that does just this, but egrep apparently does not allow MS word, excel, etc... documents to be read. I was wondering if someone could point me in an alternate direction that would allow me to search these types of documents as well? (Wordfile is a file that is create elsewhere with a list of words to search for e.g. bus)
    Thanks!
    cat << EOF > ${TMPDIR}/scanit
    rm -f ${TMPDIR}/strings
    strings "\$1" | egrep -n -i -f ${TMPDIR}/wordlist >> ${TMPDIR}/strings
    if [ -s ${TMPDIR}/strings ]
    then
    echo >> ${TMPDIR}/${HOSTNAME}.o
    echo "File: \$1" >> ${TMPDIR}/${HOSTNAME}.o
    file "\$1" >> ${TMPDIR}/${HOSTNAME}.o
    cat ${TMPDIR}/strings >> ${TMPDIR}/${HOSTNAME}.o
    fi
    rm -f ${TMPDIR}/strings
    EOF
    HOSTNAME=`hostname`
    export HOSTNHAME
    if [ $# -eq 0 ]
    then
    echo "You must specify the start of the directory tree to search"
    exit
    fi
    find $1 -type f 2> ${TMPDIR}/${HOSTNAME}finderrors | tee ${TMPDIR}/${HOSTNAME}_filelist | \
    head -100 |\
    sed -e "s^sh -x ${TMPDIR}/scanit \"+" -e 's/$/"/' > ${TMPDIR}/scanitnow
    sh -x ${TMPDIR}/scanitnow 1> ${TMPDIR}/${HOSTNAME}scanrun 2>&1
    cd ${TMPDIR}
    if [ -s ${HOSTNAME}.o ]
    then
    date "+%Y%M%d_%H:%m:%S: indicators found on ${HOSTNAME}" > ${HOSTNAME}scanresults.csv
    cat ${HOSTNAME}.o >> ${HOSTNAME}scanresults.csv
    else
    date "+%Y%M%d_%H:%m:%S: No indicators found on ${HOSTNAME}" > ${HOSTNAME}scanresults.csv
    fi
    zip ${HOSTNAME}_scan.zip ${HOSTNAME}finderrors ${HOSTNAME}_filelist ${HOSTNAME}scanrun ${HOSTNAME}scanresults.csv

    I don't think that info is included in metadata (though I could be wrong - checkout Query Programming and Metadata attributes). If line numbers are a key part of this, then you're probably going to have to (a) make a quick conversion of office files to plain text using textutil, or (b) use osascript to search Word via applescript. trying to read a word doc as plain text in unix is going to give you mounds of headaches (particularly if the 'fast save' option is on in Office, since that will save changes non-sequentially on disk).

  • Parse Text-File into array

    Hi,
    I hava a text-file with a structure like this.
    "sfdgasdf" "sadsadsadf" "sadfsdfasfd"
    "qwevsdf" "sdgfasdfsafd" "yxvcyxvcyxvc"
    "hgfddfhhfdfdf" "ewrtqwrwqewqr" "dfgdgdgsdgsdfgsdgg"
    My aim is to read this text-file (*.txt) and parse it into an string-array (or whatever is the best). The contents between the apostrophes should be inserted in this array line by line.
    For example:
    array[0][0] = "sfdgasdf";
    array[0][1] = "sfdgasdf";
    array[0][1] = "sadfsdfasfd";
    array[1][1] = "qwevsdf";
    How can I achieve this?
    Thanks
    Jonny
    That's how far i came (not very far).
    File file = new File("c:\temp\text.txt");
    FileReader stream = new FileReader(file);

    Hi,
    still facing some problems.
    My text which I want to parse:
    "sfdgasdf" "sadsadsadf" "sadfsdfasfd"
    "qwevsdf" "sdgfasdfsafd" "yxvcyxvcyxvc"
    "hgfddfhhfdfdf" "ewrtqwrwqewqr" "dfgdgdgsdgsdfgsdgg"
    My code:
    String[] parse = text.split("\"");
    The array that is created has whitespaces and linebreaks as elements. I only want the characters in beween the apostrophes.
    How can I "tell" the split function not to insert them in my array?
    Cheers
    Jonny

  • Scan textfield for keyword and apply formatting

    I was interested in searching through text in a textfield, and applying text formatting to keywords. For example, every time the word 'the' appears, apply a text format that changes it to green and 14pt. Here is an example of a format and text applied to a textfield. How would I go about searching through the textfield and applying this format only to specific words?
    my_txt.text = 'The cat jumped over the house.'
    /// my format I want to apply
    with (_lt_fmt) {
                    align = 'left';
                    blockIndent = 0;
                    bold = false;
                    bullet = false;
                    color = _green;
                    font = FontNames.ARIAL;
                    indent = 0;
                    italic = false;
                    kerning = false;
                    leading = 0;
                    leftMargin = 0;
                    letterSpacing = 0;
                    rightMargin = 0;
                    size = 14;
                    tabStops = [];
                    target = "";
                    underline = false;
                    url = "";

    " I replaced some var names b/c they were reserved words"
    There were no reserved words for the current or application scope.
    "How can I keep all the words highlighted in the different formats?"
    Comment out this line:
    main_txt.setTextFormat(main_txt.defaultTextFormat);
    Also, the code you showed is too verbose. You can combine declarations and and instantiation in one place and have 5 lines instead of 10:
    var highLightFormat0:TextFormat = new TextFormat("Arial",14,0xff00ff,"bold");
    var highLightFormat1:TextFormat = new TextFormat("Arial",7,0xff0000,"bold");
    var highLightFormat2:TextFormat = new TextFormat("Arial",9,0xCCCCCC,"bold");
    var highLightFormat3:TextFormat = new TextFormat("Arial", 8, 0xffEE00, "bold");
    var main_txt:TextField = new TextField();
    In addition, function  getTxtFmt and the way you deal with getting TExtFormats is an ovekill - conditionals are worse than direct references. So, I suggest your code is:
    import flash.text.TextFormat;
    // keywords to highlight
    var wordsToSearch:Vector.<String> = new <String>['the','interested','text', 'applying'];
    // TxtFormats
    var highLightFormats:Vector.<TextFormat> = new <TextFormat>[new TextFormat("Arial", 14, 0xff00ff, "bold"), new TextFormat("Arial", 7, 0xff0000, "bold"), new TextFormat("Arial", 9, 0xCCCCCC, "bold"), new TextFormat("Arial", 8, 0xffEE00, "bold")];
    // Create TextField and add to display list
    var main_txt:TextField = new TextField();
    with (main_txt) {
         multiline = main_txt.wordWrap = true;
         autoSize = "left";
         width = 400;
         defaultTextFormat = new TextFormat("Arial",12);
         x = main_txt.y = 20;
         text = "I was interested in searching through text in a textfield, and applying text formatting to keywords. For example, every time the word 'the' appears, apply a text format that changes it to green and 14pt. Here is an example of a format and text applied to a textfield. How would I go about searching through the textfield and applying this format only to specific words?";
    addChild(main_txt);
    // Iterate through Vector of keywords
    for (var i:int; i < wordsToSearch.length; i++){
         search(wordsToSearch[i], i);
    // find whole words
    function search(keyword:String, fmtChoice:int):void {
         //main_txt.setTextFormat(main_txt.defaultTextFormat);
         var txt:String = main_txt.text;
         var pattern:RegExp = new RegExp("\(\?\<\=\\s)" + keyword + "\\s","ig");
         var theResult:Object = pattern.exec(txt);
         while (theResult) {
              main_txt.setTextFormat( highLightFormats[fmtChoice], theResult.index, theResult.index + keyword.length);
              theResult = pattern.exec(txt);

  • Error while trying to retrieve text for error ORA-12154

    Hello,
    I try to install php 5.1.2 on a WIN2003 server and IIS6 with the OCi8 extension without success from several days.
    On my server I've a 920 oracle client and the 10.1 instant client, I copy the tnsnames.ora in the instant client's directory.
    I've declare many environnement variables :
    - NLS_LANG : AMERICAN_AMERICA.WE8MSWIN1252
    - TNS_ADMIN : E:\...\oracle\instantclient_10_1
    - ORA_NLS33 : E:\..\oracle\920\ocommon\nls\ADMIN\DATA
    With the php command line the oci_connect function correctly works : the php command line use the instant client's tnsnames.ora. I can query with success my database.
    When I try to load a web php script (the same as the php command line script) I have the following error " Error while trying to retrieve text for error ORA-12154" ( oci_connect( $user , $pass, $sid ) . The $sid variable have the value of an alias declared in the tnsnames.ora.
    If I replace the sid's alias by something like this " (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xx.xx.xx.xx)(PORT=1521)))(CONNECT_DATA=(SID=xx)" in the oci_connect function, I have another error : Error while trying to retrieve text for error ORA-12705.
    A web page with the phpinfo function displays the following messages about oci8 extension : It seems to be correct.
    oci8
    OCI8 Support enabled
    Revision $Revision: 1.269.2.8 $
    Active Persistent Connections 0
    Active Connections 0
    Temporary Lob support enabled
    Collections support enabled
    Do you have any idea ? Thanks a lot

    The web server is not seeing the Oracle environment correctly. You need to set PATH to the instant client libraries. ORA_NLS33 is not used for Oracle 10g clients. Perhaps you have some library conflict with two versions of Oracle on the machine?
    These may help:
    http://www.oracle.com/technology/tech/php/htdocs/php_troubleshooting_faq.html#envvars
    http://blogs.oracle.com/opal/2006/05/01

  • Having custom text for 'Actual' and 'Target' in Funne chart

    Hi,
    We see 'Actual' and 'Target' label values in funnel chart when we hover the mouse on the chart.
    But i need to change the text to custom text for SINGLE graph. I dont want to chnage any xml or config files, I need this change in single report only.
    Appreciate all your posts which helps.
    Regards
    MuRam
    Edited by: MuRam on Dec 31, 2012 7:36 PM

    http://www.adobe.com/cfusion/mmform/index.cfm?name=wishform

  • Tool Tip Text for field values in ALV report

    Hi,
    How to get the tool tip text for the field values in ALV report.
    Thanks & Regards,
    Pallavi.

    Hi,
    In fieldcatalog specify the TOOLTIP.
    <b>
    LVC_S_FCAT-TOOLTIP
    </b>
    In this speicfyteh tooltip you want.
    Then append this to the fieldcatalog.
    Hope this solves ur problem.

  • Activate text for Cost Center for ME51N, ME52N, ME53N

    Hi, experts
    As a requirement on T/C ME51N, ME52N, ME53N is needed to activate on "Account assignment" tab, the text for Cost Center field, how can I do this?
    Thanks in advance.
    Is there any path or exit could help with it?

    I need to add on Tabstrip "Account assignment" for fields
    CO Area and Cost Center text field description ( right side ) for each one.
    How can I do this? Thanks in advance.

  • Help Text for Field Name.....

    Hi Experts,
    In ALV Report there is Feild names like Order No., Qty, etc.
    When the user moves the cursor to the Feild Name i.e. Qty, it should show help text "This Qty is for A-B...".
    How to bring help text for Feild name when the cursor move to feild name ?
    Pl. guide.
    Yusuf

    Hi Shiva,
    There is no field TOLLTIP in SLIS_FIELDCAT_ALV.
    My sintex is :
      w_fcat-col_pos     = 9.
      w_fcat-fieldname = 'FACTOR'.
      w_fcat-seltext_l = 'Stock Value (55 %)'.
      w_fcat-outputlen = 18.
      w_fcat-do_sum = 'X'.
      APPEND w_fcat.
      CLEAR w_fcat.
    Is there any other way becaz there is no field like tooltip?
    Yusuf

  • How to change Alt text for the Popup Key LOV Image in Apex  3.2.1.00.10

    we are using Application Express version is 3.2.1.00.10
    There is an icon to click on to popup the lov search box, the alt text for that image is currently "popup Lov"
    would it be possible to change the text to something more meaningfull e.g. "Lookup Person name" or "search Directory for Person names" .
    I have tried by updaing them
    from
    Shared Components>Templates> Popup List of Values Template > Popup Icon Attr --> width="13" height="13" alt="Popup Lov"
    (under Popup List of Values)
    to
    alt="#CURRENT_ITEM_NAME#"
    it didn't work.
    your respone will help getting accessability sign off

    Venu,
    Try adding title = "Lookup Person name" to the Image Attributes of your icon or button.
    Jeff

Maybe you are looking for