Spamassassin Training from Bulk Data

I have my own 1000 message spam collection (collected from my own spam-trap email address) and use this to initialise the Spam database (using sa-learn). I also initialise the Ham database with a similar selection of my own good emails
However, the header recipient data on the spam contains my spam-trap domain & the Ham contains my own domain. Neither have the domain of the target system it gets used on. Is there a potential problem with this? Anyone know what significance is placed on the recipient domain name during the analysis?
I keep the most recent 1000 spam messages, deleting older ones, to keep the spam data up-to-date and was considering running this through existing spamassassin installations. However, the above question has me hesitating as to whether this would actually be a good idea or not.
thanks for any clarification
-david

Thanks Alex.
Yes, I think you are right in that a 'correction' made to an already processed message would have more of an effect on the accuracy than just new messages being added by me. I was thinking that a quarterly update from my own spam-trap would be able to replace any on-going training by the actual users but thinking about it further, a few corrections by a user is probably of more value than a few hundred completely new messages, so maybe not worth the trouble.
However, you have put my mind at rest on the initial training.
Cheers
-david
For anyone searching for "bulk training" and coming across this message, I'll just list the steps I use...
• Save the spam messages you want to use into a mailbox in your mail client (I use Mail). E.g. mailbox called: SpamLib
• Copy the folder: ~username/Library/Mail/Mailboxes/SpamLib.mbox/Messages to the server. E.g., to server location: /Messages
• Run sa-learn, reading messages from same folder...
sa-learn --spam /Messages
Do same for non-spam, using --ham option.

Similar Messages

  • Error while retrieving bulk data from mdm in staging server

    I am getting an error while retrieving bulk data from MDM. The error goes like this "NiRawReadError: An error occured when reading data from socket.".
    Could anyone please suggest me about the possible cause of this error. Please reply soon.
    Moderator message: network error, relation to ABAP development unclear, please search for available answers, check with system administrators, consult SAP service marketplace.
    Edited by: Thomas Zloch on Nov 22, 2010 5:16 PM

    Can you elaborate the thing... I don't think  /*+ APPEND */ this is working for me ,still I am getting same error.
    If you have any other suggestion,I would like to hear.
    Should i not put commit after some 500 records inserted ? As i am putting commit once after whole data gets inserted.

  • Delete bulk data from multiple tables

    Hi,
    I am having two tables from which data needs to be deleted based on some where clause.
    Can anyone help me how to delete bulk data means more than 10000 rows at a time to improve the performance.
    Regards,
    Dinesh

    LPS wrote:
    This will be simple delete statement and make sure whether the where clause reffered columns are indexed.Indexing may or may not help. If he is deleting 10000 rows out of 20000 it won't help at all. In fact, indexing may
    make things worse as it will slow down the delete.

  • URGENT : Return Bulk data from Stored Procedure

    Hi,
    Tell me, how do I return a bulk of data which
    does not exist in the data base
    but is concluded while the Stored Procedure is executed
    from the Stored procedure
    to the C++ program.
    For Example:
    Table ABC
    Field1 Field2 Field3
    A 1 3
    B 1 5
    C 2 10
    Table DEF
    Field1 Field2 Field3
    D 10 24
    E 3 16
    F 8 19
    SP_TESTING
    Depending on the values in both the tables
    for some range of conditions,
    a conclusion X is derived for each range value of the
    condition range.
    Now I need to return this bulk of data X with the
    condition they belong to
    back to the C++ code calling it....
    NOTE : A stored procedure is requited as there is a lot
    of processing
    required before we conclude the result X for each value
    in the condition range.
    If I execute this code from C++ instead of Stored
    procedure
    it is very slow and speed is a prime requirement of my
    system.
    Also i'm not using any MFC class to access database.
    I'm using ConnectionPtr, RecordsetPtr and _CommandPtr
    from msado15.dll for database access...
    One solution to this could be use of Temp tables.
    As this process is used by a lot of different stored
    procedures having a common
    temp table to all will need something like 50 NUMERIC
    fields, 50 VARCHAR fields
    and so on, which doesn't seem like a very good solution
    to this problem.
    Sounds like something I would have done while in school,
    implement a dumb solution.
    So, please suggest me a solution as to how do I return
    bulk data in the form
    of recordsets from stored procedure.
    Regards
    Shruti

    Use Out parameter mode
    SQL> CREATE OR REPLACE procedure a1 (x  OUT NUMBER, y  OUT NUMBER) AS
      2  BEGIN
      3        x:= 1;
      4        y:= 2;
      5  END;
      6  .
    SQL> /
    Procedure created.
    SQL> SET SERVEROUTPUT ON
    SQL> DECLARE
      2   a NUMBER :=3;
      3   b NUMBER :=4;
      4  BEGIN
      5   a1 (a,b);
      6      DBMS_OUTPUT.PUT_LINE( 'a = ' || a );
      7      dbms_output.put_line( 'b = ' || b );
      8  END;
      9  .
    SQL> /
    a = 1
    b = 2
    PL/SQL procedure successfully completed.By default parameters are copied to the OUT parameter mode .
    COPY hint in PLSQL don’t send a pointer to calling program unit but NOCOPY
    does.
    Khurram

  • PeopleSoft CS SAIP Announce Status Issue in Bulk Data Exchange Status

    XML is generated in the provided Directory Path under SAIP “Web Service Targets” but “Announce Status” is blank under Bulk Data Exchange Status, Even the “Event Message Monitor” shows nothing!
    We have activated all SAIP service operations and their delivered routings on our side.
    The Transaction status under Bulk Data Exchange Status page says Announced but but “Announce Status” is blank on the same page.
    Announce status should have any of these possible values per PeopleBooks (Connector Error,Failure,Processing,Success,Unsupported)
    What could be wrong? Please help. Thank You...
    Regards,
    Ashish

    You are welcome. I'm glad you got it back up.
    (1) You say you did the symbolic link. I will assume this is set correctly; it's very important that it is.
    (2) I don't know what you mean by "Been feeding the [email protected] for several weeks now, 700 emails each day at least." After the initial training period, SpamAssassin doesn't learn from mail it has already processed correctly. At this point, you only need to teach SpamAssassin when it is wrong. [email protected] should only be getting spam that is being passed as clean. Likewise, [email protected] should only be getting legitimate mail that is being flagged as junk. You are redirecting mail to both [email protected] and [email protected] ... right? SpamAssassin needs both.
    (3) Next, as I said before, you need to implement those "Frontline spam defense for Mac OS X Server." Once you have that done and issue "postfix reload" you can look at your SMTP log in Server Admin and watch as Postfix blocks one piece of junk mail after another. It's kind of cool.
    (4) Add some SARE rules:
    Visit http://www.rulesemporium.com/rules.htm and download the following rules:
    70sareadult.cf
    70saregenlsubj0.cf
    70sareheader0.cf
    70sarehtml0.cf
    70sareobfu0.cf
    70sareoem.cf
    70sarespoof.cf
    70sarestocks.cf
    70sareunsub.cf
    72sare_redirectpost
    Visit http://www.rulesemporium.com/other-rules.htm and download the following rules:
    backhair.cf
    bogus-virus-warnings.cf
    chickenpox.cf
    weeds.cf
    Copy these rules to /etc/mail/spamassassin/
    Then stop and restart mail services.
    There are other things you can do, and you'll find differing opinions about such things. In general, I think implementing the "Frontline spam defense for Mac OS X Server" and adding the SARE rules will help a lot. Good luck!

  • Bulk Data

    Do we have any place form where we can get bulk data for practising .The data should represent some sort of schema object relationship .Bulk means the number of rows should be in crores. For closely working on explain plans with bulk data.

    Random data will only get you so far. It's fine for some types of bulk testing but it has two major flaws:
    1) A lot of performance issues derive from skew in our data. Randomly generatedly values , while exhibiting clumps of values is unlikley to have the extremes of data distribution which we see in real data. This includes things like variation in string length.
    2) Generating keys are difficult. Sure, we can generate unique numeric keys with ROWNUM but other types or uniqueness are harder and wrangling foreign key relationships is a complete haemorrhoid.
    So, what to do? Well there are a number of data sets out. The best place to look is [url http://www.infochimps.com/datasets]InfoChimps. This used to be a really great site but the company is (not unreasonably) seeking to make money from theit efforts, so they restrict now access to a lot of their data sets. Nevertheless many sets are free (although reigistration is required) or else just links to externally hosted public data sets.
    Most of the data sets are CSVs, so there is a certain amount of work required to get them into a database. However, it's not too difficult with external tables, and that is also a useful training in its own right.
    Cheers, APC

  • Report Builder Wizard and Parameter Creation with values from other data source e.g. data set or views for non-IT users or Business Analysts

    Hi,
    "Report Builder is a report authoring environment for business users who prefer to work in the Microsoft Office environment.
    You work with one report at a time. You can modify a published report directly from a report server. You can quickly build a report by adding items from the Report Part Gallery provided by report designers from your organization." - As mentioned
    on TechNet. 
    I wonder how a non-technical business analyst can use Report Builder 3 to create ad-hoc reports/analysis with list of parameters based on other data sets.
    Do they need to learn TSQL or how to add and link parameter in Report Builder? then How they can add parameter into a report. Not sure what i am missing from whole idea behind Report builder then?
    I have SQL Server 2012 STD and Report Builder 3.0  and want to train non-technical users to create reports as per their need without asking to IT department.
    Everything seems simple and working except parameters with list of values e.g. Sales year List, Sales Month List, Gender etc. etc.
    So how they can configure parameters based on Other data sets?
    Workaround in my mind is to create a report with most of columns and add most frequent parameters based on other data sets and then non-technical user modify that report according to their needs but that way its still restricting users to
    a set of defined reports?
    I want functionality like "Excel Power view parameters" into report builder which is driven from source data and which is only available Excel 2013 onward which most of people don't have yet.
    So how to use Report Builder. Any other thoughts or workaround or guide me the purpose of Report Builder, please let me know. 
    Many thanks and Kind Regards,
    For quick review of new features, try virtual labs: http://msdn.microsoft.com/en-us/aa570323

    Hi Asam,
    If we want to create a parameter depend on another dataset, we can additional create or add the dataset, embedded or shared, that has a query that contains query variables. Then use the option that “Get values from a
    query” to get available values. For more details, please see:http://msdn.microsoft.com/en-us/library/dd283107.aspx
    http://msdn.microsoft.com/en-us/library/dd220464.aspx
    As to the Report Builder features, we can refer to the following articles:http://technet.microsoft.com/en-us/library/hh213578.aspx
    http://technet.microsoft.com/en-us/library/hh965699.aspx
    Hope this helps.
    Thanks,
    Katherine Xiong
    Katherine Xiong
    TechNet Community Support

  • Performance problems on bulk data importing

    Hello,
    We are importing 3.500.000 customers and 9.000.000 sales orders from an
    external system to CRM system initialy. We developed abap programs
    those use standart "bapi" functions to import bulk data.
    We have seen that this process will take a lot of time to finish
    approximately in 1,5 months. It is a very long time for us to wait it
    to be finished. We want to complete this job about in a week.
    Have we done something wrong? For example, are there another fast and
    sap standard ways to import bulk partners and sales orders without
    developing abap programs?
    best regards,
    Cuneyt Tektas

    Hi Cuneyt,
    SAP standard supports import from external source. You can use XIF adapter or you can also use ECATT.
    Thanks,
    Vikash.

  • How can I take minutes from mysql date format

    how can I take minutes from mysql date format??
    example 10:30:00 is stored in my sql and I want to create 3 variables which will store hours, minutes and seconds..
    Cheers..

    "use application date format" is the choice you want.
    Denes Kubicek
    http://deneskubicek.blogspot.com/
    http://www.opal-consulting.de/training
    http://apex.oracle.com/pls/otn/f?p=31517:1
    http://www.amazon.de/Oracle-APEX-XE-Praxis/dp/3826655494
    -------------------------------------------------------------------

  • Enhancement_CIN_Capture Incoming Excise Invoice-J1IEX bulk data upload

    Dear All,
    Sub:CIN_Capture Incoming Excise Invoice-J1IEX bulk data upload option requirement
    We are capturing the Incoming excise invoices manually in the
    transaction J1IEX with huge datau2019s and according to the volume of data
    it is very difficult for us to enter manually and now we required for
    the option of bulk data processing to upload the data from the Excel
    file(received the softcopy from the supplier).
    As per our observations we found the BAPI named
    BAPI_EXCINV_CREATE_FROMDATA but the update level of this BAPI is not
    available in our system because as per the Indian Government norms one
    ofthe current Excise Duty Tariff is as below
    1. Basic Excise Duty (BED 8%).
    2. Education Cess (ECess 2%)
    3. Secondary Education Cess (SECess 1%)
    and we observed the SECess (1%) is not covered in the above mentioned
    BAPI so we are not in a position to proceed further.
    So Kindly update us is any other relevant option will solve the purpose.
    We are in a quite difficult situation to uplaod the datas to our system
    so please do the needful.
    Regards,
    PrabuM

    Please note that CIN uses the 'MB_MIGO_BADI' definition and 'CIN_PLUG_IN_TO_MIGO' is the implementation. You can create multiple implementations of this BADI. The same BADI is used for single step Capture & Post of excise invoice in MIGO. Kindly use this BADI as per your needs. SAP std does not support any BAPIs for Goods Receipts with Excise updates

  • SAP Training from Remote (Internet based)

    SAP Training from Remote (Internet based)
    Instructor – Mr. Nanda Kishore
    §     IIT engineer with 10 years of SAP specific training experience
    §     Trained hundreds of students now well placed in BIG 4 consulting firms
    §     Official trainer for companies like Infosys, Triniti, Coconut and Emcure
    §     Experienced in ABAP, BW, XI
    Upcoming Training session - SAP ABAP
    Batch starts August 13, 2007
    Highlights
    §     Rigorous curriculum
    §     Techno functional perspective to ABAP development
    §     Real time scenarios
    §     Graded assignments
    Target Audience
    §     Recent graduates seeking to start a career in SAP
    §     Programmers looking to transition to a promising career with SAP
    §     Functional consultants or Industry process experts looking to enhance their ABAP programming & debugging skills
    Testimonials
    “… Great experience being taught by an IITIAN….I think my IQ increased after getting trained by him and I had offer from two companies Accenture and Bearing Point. He prepared me so thoroughly for client interview and life afterwards that not only I got selected in my first interview but till date I could not find a better consultant than myself. I am currently working as a senior consultant in Bearing Point.” – Arshad
    “ … Excellent trainer with great depth of knowledge. His exercises and curriculum are both very challenging. The whole training was a blast and there was an exponential increase in my knowledge all throughout the course.” – Hemant, EDS employee
    “I always use to think that I will some day jump to SAP technologies but with two kids I never had the energy and the stamina to change to SAP till I met Kishore. My income jumped up in six months time. Today I am a team lead with a big pharmaceutical company. Mr. Kishore is the one who fully helped me change my career path.” – Saurav 
    For more information (fees and connectivity) contact
    Email: [email protected]
    Tel: 732-529-4188

    Hi,
    All mentioned options are possible to access SAP Business One, like:
    Static IP address for SAP B1 server
    Citrix (application or remote desktop)
    VPN (you can even setup a simple PPTP, L2TP or OpenVPN). But please be aware that VPN is not recommended by SAP way of connection to ERP systems
    There is the fourth nice alternative to avoid VPN and RDP in case you can't use both of them (because of, for instance, local network security settings that blocks you to setup a connection). You can setup HTML Clientless RDP connection at your main server of the remote network (it is even not necessary your SAP B1 server, you should do it on the main machine that has static IP):
    Guacamole - HTML5 Clientless Remote Desktop
    In this case you will get the RDP connection to your client or server machine with SAP B1 from any kind of network and any kind of device with no additional client setup. It is even work from the mobile phone, but, of course, the screen it too small
    Here is how it looks like (in my local browser):
    Kind Regards,
    Siarhei

  • Training a ABC Data Mining (DM) model for a Web Template.

    Hello All,
    I need to create a Web Template that uses the ABC web item. This Web Item needs a "trained" ABC Analysis DM model. The only tool that I know of that can "train" a DM model is the APD. However the APD does not support the ABC DM as a data target. Is there any other way to "train" a ABC DM model ? Or are there gaps in my knowledge ?
    Dorothy

    Hi,
    Being new to mining you have really set off on a ambitious mining project :)
    Couple of technical pointers:
    *1) Version of Data Miner being Used*
    You are using the original Data Miner release.
    I would download the latest SQL Dev release that contains the current Data Miner client and repository installation.
    http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html
    SQL Developer 3.2.2 RTM Version 3.2.20.09 Build MAIN-09.87
    Drop the old repository and start with this latest one, assuming you are just getting started and have no significant mining worklows created.
    You can always export the workflows to disk if you want to import them to the repository.
    Alternatively you can migrate the older repository, but I would avoid unless you really need to, as it requires Data Miner to hold on to some older repository definitions.
    *2) Handling of text*
    It seems your primary source of data for the clustering process will be the cs_uri_query.
    You might find better results processing it as text data rather than as categorical data.
    You can use the Build Text node to transform cs_uri_query into a nested column that contains text tokens.
    *3) Methodology definition*
    This is probably your biggest challenge really.
    What is the overall methodology to produce the desired result.
    You stated your objective is: develop an intelligent recommend model based on queries recorded in the web log
    Once you create clusters from this data, what are your next steps?
    What type of recommendation do you want to generate?
    Thanks, Mark

  • Upload bulk data into sap?

    hi all,
    let me know is there any methods to upload bulk data into sap and can same data be modified , delete , add. please correct me if i am wrong. what i know is that we can do with lsmw method, as i am new to lsmw method please let me know where to hunt for lsmw documentation.
    thanks,
    john dias.

    Hi John-
    According to SAP, The Data Transfer Workbench supports the automatic transfer of data into the system. The Workbench is particularly useful for various business objects with large amounts of data. It guarantees that data is transferred efficiently and ensures that data in the system is consistent.
    Further, The Legacy System Migration Workbench (LSMW) is a tool recommended by SAP that you can use to transfer data once only or periodically from legacy systems into an R/3 System.
    For your purpose you might be helped by the following two links-
    'Data Transfer Workbench' - http://help.sap.com/saphelp_47x200/helpdata/en/0d/e211c5543e11d1895d0000e829fbbd/frameset.htm
    'Using the LSM Workbench for the Conversion' - http://help.sap.com/saphelp_46c/helpdata/en/0f/4829e3d30911d3a6e30060087832f8/frameset.htm
    Hope these links help,
    - Vik.

  • Bulk Data Upload

    Hi
    We have a requirement to load bulk data which would be a full dump (and not incremental) in CSV format almost every week from other applications.
    This implies that I can drop my tables and rebuild the same using the CSV files that I have received.
    I was just wondering is there is any real efficient tool or utility in ORacle (or outside) to import huge amount of data (apart from SQL Loader, Ext Tables and Data Pump)
    Regards
    Kapil

    I don't know of any tool apart from loader/Ext-table and Datapump.
    You may find tools which you can buy (and claim they are really good).
    Honestly, if you want to load flat file data (gigabytes or kilobytes) into Oracle, there is nothing better than SQL*loader, "if you use all its capabilities" (External tables and loader are same thing, just the wrapper is different).
    Cheers

  • How to do Bulk data transfer  using Web Service

    In my application I have to write various web services but majority of the web service has to
    query database and return back bulk data(rows>10K) through web service.
    So I would like to ask what is the efficient way of transferring bulk data using web service as presently
    Iam returning the dataset as xml String (using StringBuilder) from web service and consuming the same at client end.
    Is there a better way to this in web service?
    My env:
    Front end can be in any other technology ,UI like C#
    Back end : Tomcat 6 on Java 6 with Axis2
    Thanks in advance

    Innova wrote:
    But then also I have to mention a return type although its xml that is getting transferred.
    Can you provide me a with a small sample code.Like if I have Emp obect with properties
    Empname,EmpID,DOJ,DOB,Dpt . In this case what will be the return type.
    My major concern is that my resultset is going to be huge in terms of >10,000 rows so
    that much time for formation of xml and then the transfer so how can I reduce the transfer
    time ,I meant a faster optimised approach for transferring large data as part of web service.
    Simply putting , how can I transfer bulk data in minimum time so that I can reduce the
    waiting time at client side for the response.
    Thanks in advanceI've done this with thousands before now, and not had a performance problem...at least nothing worth writing home about.
    Your return type will be an array of Emp objects. You can use a SOAP array for that (you'll need to look that up for the syntax, since I can't remember it off the top of my head), which gets restricted to being an array of Emp.
    Should the normal technique prove to be too slow then you should look at alternatives, but I would expect this to work well enough. And you have no faffing about with devising a new transfer system that then has to be explained to your client system...it'll all be standard.

Maybe you are looking for

  • Email Discoverer Report

    Hi, I am using Discoverer Desktop 10.1.2.48.18. I am in a situation where I should be able to email each page of the report to a different person when ever the report is run. Each page of the report has the information of different customer and a par

  • Driver for hp laserjet 1018

    i want to use an laserjet 1018. but i cannot find an driver for mac os. on the hp home driver support page ther is only drivers for windows and linux. wht can i do?

  • Stock transfer order

    Hi Gurus, While doing Stock Transfer order - assigning deliverying plant to sales organisation : I could not assign and message is giving : "The sales org is part of company code Z222, Plant Z223 is part of company code Z223. Organization data for in

  • PXI 8310 Cardbus doesn't work with HP compaq nx9105 laptop

    Hi, I have to install a NI PXI 8310 Cardbus on a  HP compaq nx9105 (which is a tested laptop) with a processor ADM Athlon 64, 350 MHz, 1 GB RAM, Win XP Pro. The PCMCIA controllers are "Controller Texas Instruments PCI-1620 Cardbus Controller with Ult

  • Creating Labels in Mail forms

    Hello everyone, I'm currently working on a CRM Marketing implementation. We are using the Segment builder to create target groups and then either send emails or print mails to the Business partners in that target group. Is it possible to create and p