Getting the meaning of the article

i am trying to parse news articles in a way that I can tokenize the article first and then remove the most comon words in english like "a", "an" and "the",which usually have no relation with the meaning of the article
then i will be counting the total number of words of the article and the total number of each word, i.e. reptitions of each word in the article to get the frequency of occurence of each word
wat i m assuming is that the word with the highest frequency has a very high relation with the meaning of the article
now toextract the code I am using the URL class
now can sumbody help me as to how i shud go abt implementing it?

I'm not sure why you are using the URL class, unless you mean that the articles are on the internet, and you are retrieving them from there.
I'm not about to give you a complete solution, but look at the java.util.StringTokenizer class to handle splitting the text into words. You could then have a java.util.Set of words that need to be ignored (I suggest using HashSet).
Looking at the Java API documentation is always a good start, but since the API is so big, it's always good to get a nudge in the right direction.
RObin

Similar Messages

  • How to get the meaning of the regular expression?

    Hello All,
    I want to know how to get the meaning of the regular expression?
    The requirement is i need to get the regular expression for some of the attributes and if the value is not matching with that regular expression then i need to give the popup saying the limitation of the attribute. but i need to give the pop up with the user understanding format.
    Like "please give a to z or 1 to 9" like that.
    So is there any way Java will help me to get the meaning of the regular expression?
    Thank You!
    Arun S

    I'm not aware of any such tool or library.
    Also, it would be a terrible "explanation", because regular expressions (similar to other programming languages) have their own "style" of defining what to enter and that usually doesn't translate well into natural language.
    For example the regex "[a-z][a-z0-9]*\s+[a-z0-9]+" could be translated as "a to z, followed by zero or more characters from a to z or 0 to 9 followed by any amount of whitespace followed by one or more characters from a to z or 0 to 9".
    Or you could simply say "Please enter two alphanumeric words, the first one must not start with a number".
    My suggestion: store/configure the human-readable description together with the regex. Don't try to automate it.

  • IOS 7 doesn't work on iPod touch 4g, so does the mean its the end for this device? Or are they going to make updates specially for lower devices an apps that are compatible ??

    iOS 7 doesn't work on iPod touch 4g, so does the mean its the end for this device? Or are they going to make updates specially for lower devices an apps that are compatible ??

    Which apps remain compatible will be up to the individual developers who sell those apps. There will be no further iOS updates, but those aren't really necessary to continue to use and enjoy your device. I still have an iPad 1 running on iOS 5, and it is fully functional.
    The App Store has an app called VintApps 3.1.1 that is specifically geared toward iOS 3.1, however, you would probably be able to run most of those apps on your iPod, and considering that there will be a rather large market for iOS 6 apps users, it is not likely that you will see a significant reduction in choices any time soon.
    It is not the end for the device in any sense except that it is at it's ultimate iOS level. And again, I have an iPad that is at it's ultimate iOS level (iOS 5), and I get new apps all the time that work for both my iPad and my iPhone that runs iOS 6.1.1
    No need to fret....
    Cheers,
    GB

  • What's the meaning of the various .bin files used by SAM?

    hey everyone,
    I'm trying to understand why exactly there are so many .bin files used by the SAM, and how to compile each one of them.
    My .rc file for the simulation contains:
    load bin reset.bin 0xfff0000000
    load bin q.bin 0xfff0010000
    load bin openboot.bin 0xfff0080000
    load bin nvram1 0x1f11000000
    load bin 1c1t-md.bin 0x100000
    load bin 1c1t-hv.bin 0x180000
    load bin disk1.img 0x1f40000000
    I don't understand why I have 3 different files - reset.bin, q.bin and openboot.bin when I would have expected only one bootfile.
    I also don't know what's the meaning of the 1c1t-md.bin file (what md stands for).
    If anyone could clear this subject for me, and also point me to the sources of each such file (if they exist in the project, and assume they do, at least for most of them), I'd be grateful.
    Thanks,
    Mintz Yuval

    Hi,
    Sorry I haven't replied sooner.
    I get your project now and it sounds pretty cool.
    Let me answer your last question first. The service processor resides a separate motherboard running independently of the N2 processor. The SP and N2 processor communicate via a I2C link which allows the SP to read and alter the state of the N2. The cpu, caches, NIU, memory, etc. are all connected by this link. Once the N2 is running, the SP and the N2 firmware may exchange messages via a mailbox protocol over the I2C.
    nvram1.bin is a configuration file that is OBP specific. On HW, it would be a little ROM that obp reads during the boot. Since it's in memory, SAM doesn't make any assumptions about it. You can remove it or rework it to suit u-boot.
    disk.img is the contents of a virtual disk which contains the root filesystem. This is essentially a ram disk which has been added to SAM's version of the hypervisor and/or obp. On HW, the firmware does not support this kind of virtual ram disk. For SAM, I'm pretty sure that obp is extended to create a device tree node for the virtual disk. Hypervisor may not know about the virtual disk.
    Since disk.img is a UFS (or other Solaris-oriented) filesystem, you will probably want to replace it with a ext3 or Linux-oriented filesystem. Since Linux has direct kernel support for ramdisk, you might be able to build/configure the kernel with the location of the root filesystem image in memory. Or this might be a u-boot option.
    As to 1c1t*.bin and reset.bin. On HW, the service processor is responsible for discovering all the devices and cpus on a system and writing the machine description file (e.g. 1c1t*.bin) and the reset firmware (e.g. reset.bin). These files are used during boot and various resets to initialize or reinitialize machines state for the cpus and all the i/o devices. Since SAM doesn't include a service processor, these files are static and created completely outside of SAM. The md*.bin files contents must match the configuration of cpu's in the sam.rc file -- firmware will fail if it tries to run code on a strand that is missing.
    The md and reset.bin files are used during the boot process to set-up CMT registers and do a little bit of i/o device initialization. SAM doesn't support reset modes such as warm reset.
    I hope this helps. Let me know if you need me to track down more details.
    Cheers,
    Stephen

  • [EWS][FastTransfer][MS-OXCFXICS] Question about paging in the Fast Transfer Stream after exporting message and the meaning of the config in the previous section of the stream.

    Hi, all.
    First, I want to construct a fast transfer stream by programming not using the export item's stream and to import the stream to the folder by posting import request in EWS.
    So, I export the item by EWS and investigate the exporting stream's struct by reading the [MS-OXCFXICS].pdf. And I can parse each MAPI property, marker, meta-property from the stream.
    I find that the stream is made of many pages, most page's length is 0x7BC0, some are less then 0x7BC0. And in the previous section of the stream, there are some configurations. So the questions are:
    1. What is the paging rule? or What is the principle of the paging? Whether a complete propvalue/marker(ref from [MS-OXCFXICS].pdf) must be contained in a page except some binary type?
    2. Why the page's binary length is 0x7BC0? Or where is the configuation about paging length?
    3. The stream is made of the FXOpcodes and the subbuffer about the FXOpcodes, and I find the mean of the FXOpcodes:
    internal enum FxOpcodes
    None = 0,
    Config = 1,
    TransferBuffer = 2,
    IsInterfaceOk = 3,
    TellPartnerVersion = 4,
    StartMdbEventsImport = 11,
    FinishMdbEventsImport = 12,
    AddMdbEvents = 13,
    SetWatermarks = 14,
    SetReceiveFolder = 15,
    SetPerUser = 0x10,
    SetProps = 0x11
    and in the stream there are 4 part(after I parse the stream):
    A. OpCode:Config, value:0000000001000000
    B. OpCode:IsInterfaceOk, value:010000000703020000000000C00000000000004600240080
    C. OpCode:TellPartnerVersion, value:000F91838417
    D and follows is: OpCode:TransferBuffer, Count:31680
    So 
    a. What is the OpCode:Config value meaning?
    b. Some parts of The OpCode:IsInterfaceOk' value is the IID_IMessage("00020307-0000-0000-C000-000000000046"), What is other parts meaning?
    Thanks in advance for any ideas.

    Hi Haiyang,
    Your questions are not covered by the Open Specification documentation and this is documented in [MS-OXWSBTRF] — v20141018 section “1.3 Overview”.
    The upload and export data stream is an opaque format that only needs to be understood by a server implementation. The client only serves as a repository for the opaque data stream so that it can be uploaded to the server at
    a later time.
    You made good progress in interpreting the buffer using other documents. Why do you need to be able to parse the data? Please describe your project in details and I might submit a suggestion to document the layout of the data stream,
    if your project justifies it. You can send the description directly to me.
    Thanks, Vilmos

  • [EWS][FastTransfer] Question about paging in the Fast Transfer Stream after exporting message and the meaning of the config in the previous section of the stream.

    Hi, all.
    First, I want to construct a fast transfer stream by programming not using the export item's stream and to import the stream to the folder by posting import request in EWS.
    So, I export the item by EWS and investigate the exporting stream's struct by reading the [MS-OXCFXICS].pdf. And I can parse each MAPI property, marker, meta-property from the stream.
    I find that the stream is made of many pages, most page's length is 0x7BC0, some are less then 0x7BC0. And in the previous section of the stream, there are some configurations. So the questions are:
    1. What is the paging rule? or What is the principle of the paging? Whether a complete propvalue/marker(ref from [MS-OXCFXICS].pdf) must be contained in a page except some binary type?
    2. Why the page's binary length is 0x7BC0? Or where is the configuation about paging length?
    3. The stream is made of the FXOpcodes and the subbuffer about the FXOpcodes, and I find the mean of the FXOpcodes:
    internal enum FxOpcodes
    None = 0,
    Config = 1,
    TransferBuffer = 2,
    IsInterfaceOk = 3,
    TellPartnerVersion = 4,
    StartMdbEventsImport = 11,
    FinishMdbEventsImport = 12,
    AddMdbEvents = 13,
    SetWatermarks = 14,
    SetReceiveFolder = 15,
    SetPerUser = 0x10,
    SetProps = 0x11
    and in the stream there are 4 part(after I parse the stream):
    A. OpCode:Config, value:0000000001000000
    B. OpCode:IsInterfaceOk, value:010000000703020000000000C00000000000004600240080
    C. OpCode:TellPartnerVersion, value:000F91838417
    D and follows is: OpCode:TransferBuffer, Count:31680
    So 
    a. What is the OpCode:Config value meaning?
    b. Some parts of The OpCode:IsInterfaceOk' value is the IID_IMessage("00020307-0000-0000-C000-000000000046"), What is other parts meaning?
    Thanks in advance for any ideas.

    OK. Thank you.
    Today I build a stream with a paging rule as below:
    1. Keep each page's byte length is 31680.
    2. Make sure a complete property value in a page except binary type property whose value length is larger
    than 31680.
    And it can work very well when import the stream to the folder by EWS.

  • What is the meaning of the oracle service  "SERVICE=orclXDB"  ?

    Hi See below.
    What is the meaning of the oracle service "SERVICE=orclXDB" ?
    SQL> r
    1 select name,
    2 value
    3 from v$parameter
    4 where upper(name) in (
    5 'DISPATCHERS',
    6 'MAX_DISPATCHERS',
    7 'SHARED_SERVERS',
    8 'MAX_SHARED_SERVERS',
    9 'CIRCUITS',
    10 'SHARED_SERVER_SESSIONS',
    11 'LARGE_POOL_SIZE',
    12 'SESSIONS'
    13* )
    NAME VALUE
    sessions 170
    large_pool_size 0
    dispatchers (PROTOCOL=TCP) (SERVICE=orclXDB)
    shared_servers 1
    max_shared_servers
    max_dispatchers
    circuits
    shared_server_sessions
    8 rows selected.
    SQL>
    SQL> select * from v$services;
    SERVICE_ID NAME NAME_HASH NETWORK_NAME
    5 orclXDB 3468872077 orclXDB
    6 orcl 2392458149 orcl
    1 SYS$BACKGROUND 165959219
    2 SYS$USERS 3427055676
    SQL>

    Hi Siva,
    don't you remember Re: Net service name and service ? :-)

  • What is the meanning of the following ?

    hi , db 10g ,
    What is the meanning of the following :-
    1- Literals .
    2- Expressions in Sql .
    3- PL/SQL Expressions .
    4- Padded blanks .

    It is all documented.
    1 - Literal: https://web.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/sql_elements003.htm#sthref377
    The terms literal and constant value are synonymous and refer to a fixed data value. For example, 'JACK', 'BLUE ISLAND', and '101' are all character literals; 5001 is a numeric literal. Character literals are enclosed in single quotation marks so that Oracle can distinguish them from schema object names.
    2 -SQL Expressions: https://web.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/expressions001.htm
    About SQL Expressions
    An expression is a combination of one or more values, operators, and SQL functions that evaluates to a value. An expression generally assumes the datatype of its components.
    This simple expression evaluates to 4 and has datatype NUMBER (the same datatype as its components):
    2*2 
    SQL (simple) expression: https://web.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/expressions002.htm#sthref802
    simple_expression::=
    3 - PL/SQL expressions: https://web.stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10807/02_funds.htm#sthref211
    Expressions are constructed using operands and operators. An operand is a variable, constant, literal, or function call that contributes a value to an expression. An example of a simple arithmetic expression follows:
    -X / 2 + 3 
    Unary operators such as the negation operator (-) operate on one operand; binary operators such as the division operator (/) operate on two operands. PL/SQL has no ternary operators.
    The simplest expressions consist of a single variable, which yields a value directly. PL/SQL evaluates an expression by combining the values of the operands in ways specified by the operators. An expression always returns a single value. PL/SQL determines the datatype of this value by examining the expression and the context in which it appears.

  • Where can I find a list of the meaning of the symbols that can appear in the headline at the top of the screen on the ipad and iphone?

    where can I find a list of the meaning of the symbols that can appear in the headline at the top of the screen on the ipad and iphone?

    http://support.apple.com/kb/HT1558
    http://support.apple.com/kb/TA38663
    http://support.apple.com/kb/HT4982
    They are also listed in the built-in User Guide accessible directly on the iPhone in Safari.
    Find the "iPhone User Guide" bookmark, then go to iPhone at a Glance > Status icons

  • What's the meaning of the error code 1009?

    What's the meaning of the error code 1009?

    See this support document http://support.apple.com/kb/TS3694 if you are doing a restore.
    Another option is you are trying to access an iTunes store that is not supported in your country.

  • Hi! Anybody knows the meaning of the "ornament" feature in Logic software?

    Hi! Anybody knows the meaning of the "ornament" feature in Logic software?

    They're a bit odd....
    An ornament is just a visual frame in the Environment, that does nothing except, well...be there.
    I've seen some people use them in complex Environment setups to act as a border for a group of buttons for example, so it acts like one of those little plastic boxes that you puts knobs and switches onto, to make them look better.
    That's about it.

  • What is the meaning of the black line in the Layer 2 View?

    In the layer 2 View,there are many black line indicate "Ethernet 100M",what is the meaning?Does it mean the device is an end device that connected PC only?

    Since there has been no response to your post, it appears to be either too complex or too rare an issue for other forum members to assist you. If you don't get a suitable response to your post, you may wish to review our resources at the online Technical Assistance Center (http://www.cisco.com/tac) or speak with a TAC engineer. You can open a TAC case online at http://www.cisco.com/tac/caseopen
    If anyone else in the forum has some advice, please reply to this thread.
    Thank you for posting.

  • What is the meaning of the status value in resulted table via power shell command?

    Hello,
    I have queries about result given by powershell command:
    Get-WmiObject -Class Win32_Processor results Status code as
    OK - What does it mean ? what are other status code options  ?
    Get-WmiObject -Class Win32_ComputerSystem results Status code as OK - What does it mean ? what are other status code options  ?
    [System.Net.WebRequest]::Create($site) > GetResponse() results StatusCode as OK - What does it mean ? whats are other status code options  ?
    Would you please clarify ?
    Thanks and Kind Regards,
    Dipti Chhatrapati

    Dipti...Check the msdn site for that particular class for all the info.
    The values will differ from class to class.
    http://msdn.microsoft.com/en-us/library/aa394373(v=vs.85).aspx
    Status
    Data type: string
    Access type: Read-only
    Qualifiers: MaxLen (10)
    Current status of an object. This property is inherited from CIM_ManagedSystemElement.
    The values are:
    "OK"
    "Error"
    "Degraded"
    "Unknown"
    "Pred Fail"
    "Starting"
    "Stopping"
    "Service"
    "Stressed"
    "NonRecover"
    "NoContact"
    "LostComm"
    Thanks Azam When you see answers please Mark as Answer if Helpful..vote as helpful.

  • CREATE CONTROLFILE REUSE DATABASE ....(what's the meaning of the 'REUSE')?

    I know the Oracle documents about 'CREATE CONTROLFILE':
    [http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5003.htm]
    but i don't understand the following statements:
    REUSE
    Specify REUSE to indicate that existing control files identified by the initialization parameter CONTROL_FILES can be reused, overwriting any information they may currently contain. If you omit this clause and any of these control files already exists, then Oracle Database returns an error.
    Please Note the last statements:If you omit this clause and any of these control files already exists, then Oracle Database returns an error.
    i try a test ,nomatter the control files exists or not , it can re-create the new control files . it doesn't occur an error.
    i'm confusing,what's this statements meaning ?
    If you omit this clause and any of these control files already exists, then Oracle Database returns an error

    Hi,
    You use "reuse" to indicate that the control files identified in the initialization parameter can be reuse and can be overwritten, and the database name will be the same.
    CREATE CONTROLFILE REUSE DATABASE "TEST" RESETLOGS NOARCHIVELOG
    If you want to change the database name you move the actual control files to a different location and change the "Reuse" for "Set"
    CREATE CONTROLFILE SET DATABASE "NEW_NAME" RESETLOGS NOARCHIVELOG
    I think we can get an error using the
    CREATE CONTROLFILE SET DATABASE "NEW_NAME" RESETLOGS NOARCHIVELOG
    without moving the controlfiles if I add the REUSE before the SET it overwrite the Controlfile
    CREATE CONTROLFILE REUSE SET DATABASE "NEW_NAME" RESETLOGS NOARCHIVELOG
    Edited by: user9132844 on Apr 10, 2012 8:55 AM

  • How / where can I learn the meaning of the numbers and letters associated with each camera model?

    Also meaning of camera model names such as PowerShot, EOS, etc.
    Thanks!

    The Powershot series are all point & shoot cameras.  The letter prefer in the model number is a class within the point & shoot category.  The "G" series Powershots, for example, are advanced point & shoot bodies and somewhat high end considering they are point & shoot cameras (they're often the second camera that a DSLR owner will use they're going somewhere that a DSLR is either not permitted or simply not practical.   I had to travel for a busines trip recently and I had room in my bag to throw my G1 X in... but not room for my larger DSLRs and large lenses.
    The EOS bodies are all SLR or DSLR (SLR = single lens reflex camera and without that letter "D" on the front it implies it's a "film" camera.  With the letter D it's a "digital" camera.  Canon does not make a "film" SLR anymore -- so that's really a historical note to mention the "SLR" category.)
    Within the EOS system, you could break the system into roughly three major groupings... entry, mid-level, and pro.
    All "Rebel" series bodies are entry level.  Note that "entry" level for a DSLR is still a world ahead of point & shoots -- so don't think of these as low-end cameras.  In terms of model numbers, in North America, Canon uses a letter/number/suffix combination.  Currently all Rebel bodies start with "T".  The first was the T1i, then T2i, T3i (and T3 without the "i" suffix), T4i, T5i, (and T5 without the "i" suffix), T6i, and T6s.   Higher numbers are more recent models (the T6i and T6s were just introduced a few weeks ago.)  If there is no suffix then it's a more basic model.  The "i" denotes the higher end of the entry range.  But recently Canon introduced the T6s which is the highest end Rebel model and has some features previously only found in mid-level models.
    The Rebels include "scene" based shoting modes commonly found on point & shoots in addition to the more advanced shooting modes which are (by far) the most popular among DSLR shooters.  Part of the whole point of buying a DSLR is the incredible boost in image quality you get when you get a camera out of automatic mode, take control of the exposure settings, and use a larger lens and sensor.
    The Canon mid-level EOS cameras all have 2 numbers followed by the "D" suffix.  E.g. 10D, 20D, 30D, 40D, 50D, 60D, and 70D (currently only the 60D & 70D are still produced.)  These bodies introduce features found on the pro-level bodies (like the 2nd LCD display on top, extra direct-setting buttons so you don't have to navigate menus, and a 2nd large control dial on the rear) but still retain the entry level features like the "scene" based sooting modes that beginners might use.
    The high-level bodies have just one number digit, like 1D, 5D, 6D, and 7D.   But when Canon needs to create a newer version they can't really increase that numberic digit... so instead they rev the model by calling it a "Mark II", "Mark III", "Mark IV". etc.  So it's not really the "5D" these days... it's the 5D Mark III (or simply 5D III).  The 1D is now the 1D X.  The 7D is now the 7D II.
    A few odd things happen in this category... first, all of these bodies with the exception of the 7D II get "full frame" sensors.  This is a rather large image sensor which is the same size as a single frame of 35mm film.  It measures roughly 36m wide by 24mm tall.   The entry and mid-level cameras all have "APS-C" size sensors which measures approximately 22mm wide by about 15mm tall (which is very large compared to what a point & shoot camera would have, but not as big as these full-frame sensors.)  When you look at images that have a tack-sharp subject... and yet a beautifully soft out-of-focus background... that requries a large sensor to produce that result.  You can not get that result with a point & shoot or camera phone.
    The 7D II still has an APS-C sensor, but it's the best APS-C sensor can makes today.  That camera body is heavily optimized for fast-action photography (the 1D X is also optimized for very fast action shooting and outperforms the 7D II -- but the 1D X is Canon's "flagship" camera -- so no surprise there.)  
    The 6D also stands out as that was introduced to be an "entry level" "full-frame" body.  Prior to the 6D, all entry level cameras were basically about $2500 or more (for the body only).  This is a full-frame camera for about $1800 (body only price -- that does not include a lens.)
    Incidentally... these pro level bodies finally drop the "scene" based shooting modes that beginner's might use.  They almost might have particularly advanced focusing systems that might even be a bit intimidating for beginners (except for the 6D as that's a bit of an exception.  It's actually considered an "entry level" full-frame camera.  
    With the exception of the 6D, I would shy away from recommending Canon's high-end bodies for beginners.  I've found the focus system alone can be intimidating for people and you really do have to dedicate some time to learning the system.  If you don't take the time to learn the system, then you're really wasting your money because these cameras are amazing IF (and only if) you take the time to learn what they can do that other cameras cannot do... and learn when, why, and how to exploit those advantages.  It wont just happpen without some effort on the part of the photographer.)
    Tim Campbell
    5D II, 5D III, 60Da

Maybe you are looking for

  • New iTunes Library: How do I not lose my contacts?

    I got a new MacBook recently, and I was wondering if there is a way to sync my iPhone to this new library without losing my contacts. If I manually sync and don't check Sync Contacts, then it won't back up the contacts that I already have, will it?

  • Connecting two monitors, keyboards etc to mac mini

    I would like to keep the mac mini connected to my tv in the living room, and use a bluetooth keyboard/mouse. At the same time, my wife would like the ability to use the macmini in the basement office (one level lower than the living room). Therefore

  • Serial/batch numbers  Problem

    Hi  All, I am trying to Genetate  a goods receipt  through Code . when i m trying to generate it gives me error like You should use existing serial/batch numbers for this document type But i have to create new serial/batch so what can i do over here.

  • Server is receiving duplicated

    server is receiving duplicated files whenever we made a payment run in SAP through the F110 transaction

  • What roles needed on SLD for deployment

    Hello Gurus, We are in process of cleaning the roles a developer will have, We have SLD installed on a server and Portal on other server. To login to NWDI we use the url http://sap.company.com:53000/devinf And to connect to SLD we use url http://sap.