Crawler setting

how do we set the crawler to crawle only internal web sites no external links are desired
if we crawle www.oracle.com only pages in www.oracle.com web site get indexed.
Regards and many thanks.
Zakaria Ben-Salem

Hi,
which version of ultra search are you using?
With 9.0.2, we have URL inclusion/exclusion support, in which you can
control the crawling scope. This is a step in data source creation.
David

Similar Messages

  • Incremental crawl does not start automatically

    In central admin, once upon a time SP would generate an automatic incremental crawl set for every 60 minutes throughout the day. But something has "happened". When I manually start an incremental crawl, it says "starting" but then after
    a couple of minutes the status says "Idle"; whereas before I would get "Completed" status.
    I'm not getting any errors when I start a crawl. 
    Can someone tell me what I should check for to see why the incremental crawl is not crawling like it should?
    artisticweb

    Hi artisticweb,
    I started an incremental crawl in my SharePoint 2013 server, the status were followed by “Starting” “Crawling Incremental” “Completing” “Idle”. If I didn’t click Refresh, then it will straightly show “Idle” after a couple minutes.
    Please go to Search Administration > Diagnostics > Crawl Log > Crawl History, and see if incremental crawl is recorded there.
    Regards,
    Rebecca Tu
    TechNet Community Support

  • How to access Properties in Custom Crawler

    [Plumtree 5.0.2 / .NET]
    Hi,
    Where in the source code for the NT File Crawler does the accessor (in this case the html accessor) pick up the metadata from the document and input the values into the Plumtree Cards?
    To explain - we are developing a custom crawler that will crawl html formatted files containing metadata about the documents to be indexed, and not crawling the original documents themselves. (I can explain the reasons for this in more detail if relevant)
    One of the values included in this metadata is the location of the original file which I need the crawler to make the ClickThroughURL.
    Rather than parsing the html document myself, I figured the HTML Accessor is already doing the job for me. The field is mapped as a property in the document type for this crawler and I just need to work out how to to access that to replace:
    data.ClickThroughURL = CurrentFile.FullName;
    with:
    data.ClickThroughURL = MY_VALUE_FROM_PROPERTIES;
    I'm struggling a bit through lack of doco but I thought if I could find the section of the code where the crawler sets the properties for each document I'd be able to see how I can access the collection - I'd thought it would be in IDocument.GetMetaData, but I can't spot it.
    I hope this makes sense to someone out there....
    thanksAni

    Hi Clinton,
    I checked PTSpy and it appears to be looking at the right URL, but receiving a 404. PTSpy records the following, (I've changed the server name but it was correct):
    Entering Function CPTUserCookieDatabase::AddHeadersToCookies: parameters: lpctstrDomain = APP_SERVER_WHERE_CRAWLER_LOCATED.hosting.companyname.net; lpctstrObject = /ANITA/DJCustomCrawlerWS/XUIService.asmx; lpctstrHeaders = HTTP/1.1 404 Not FoundServer: Microsoft-IIS/5.0Date: Wed, 02 Mar 2005 03:12:54 GMTX-Powered-By: ASP.NETX-AspNet-Version: 1.1.4322Cache-Control: privateContent-Type: text/html; charset=utf-8Content-Length: 1530; lCookieSegment = -1;
    I wondered if it's trying to look for "/ANITA/DJCustomCrawlerWS/XUIService.asmx" in the portal server or whether it's looking in the lpctstrDomain where the file actually exists. Would you expect to see a fully qualified URL for lpctstrObject? Either way, as I said there is no 404s recorded in the IIS logs on either servers.
    ==================================
    Back to the question of getting a customized property value from the Crawler. Can you explain a little more about "CrawlerConstants.TAG_PROPERTIES" and what I'm looking for please? I can see that this is getting its value from the web.config key "UseRemoteProperties" which in my instance had the value of "0" i.e. USE_LOCAL.
    The custom property I want to get the value of is called "FileLocation" and when i go to the knowledge directory and check the properties page of the documents I'm crawling the value is right there in the "Customized Properties" exactly as expected whether I've got "UseRemoteProperties" set to 0 or 1 (and it's refreshed any time I change it and re-run the crawl).
    In Document.cs (GetMetaData()) the DocumentMetaData() object is instantiated and the method Put() is used to populate some of the document's property values, and there is the comment that they are overwritten by the accessor depending on the "proper setting in the web.config file" which is the CrawlerConstants.TAG_PROPERTIES that you spoke of.
    So where does the accessor do this? Somewhere, something (the accessor presumably) is writing all the other custom properties to the card. I've tried to use the Get() method for DocumentMetaData:
    data.Get("FileLocation");
    to retrieve the value for the one I'm after but when I write this out to the log it's always blank, the value hasn't been set yet. Where and how can I retrieve this value to update the clickThroughURL with it?
    Thank you again. Please shout if I haven't explained this very well, I'm wondering whether my lack of understanding is making it hard for me to phrase my questions, but I'm getting desperate, I've spent far to much time and money on this.
    Ani

  • Crawling a Portal directory

    Hi all,
    I'm back again with what's probably another obvious question. Here's the situation. I'm trying to crawl the contents of one some folders from one Portal to another.
    I've created another World Wide Web data source and tried to created a crawler that uses this.
    I've included in the crawler set up a URL to the knowledge directory.
    Within the data source I've given it a valid user name and password for the Portal to be crawled. I've also then provided details of the portal log in form to try and get access.
    These are as follows.
    Login URL: http://portalv5dev2.wiley.com:8080/portal/server.pt?
    Post URL: http://portalv5dev2.wiley.com:8080/portal/server.pt?
    Then added form fields as follows:
    in_hi_space = Login, in_tx_username=CrawlerUser, in_pw_userpass=Crawlerpw, in_se_authsource="", in_hidologin=true, in_hi_spaceID = 0, in_hi_control=Login
    When the job runs it reports
    Error in getting child Documents and Containers (for node Crawler Start Node) from the Data Source Provider: IDispatch error #17152 (0x80044500): [CPTWebCrawlProvider::GetMIMEType, could not open 'http://portalv5dev2.wiley.com:8080/portal/server.pt?space=Dir&spaceID=1&parentname=MyPage&parentid=0&in_hi_userid=1&control=OpenSubFolder&subfolderID=2109&DirMode=1'(0x80044f65) <unknown error>]
    Is it possible to crawl another portal directory or should I develope my own portlets?
    Thanks for any help. Adam

    I have created a portal self crawl. I did this by creating a experience defination without sso and a cralwer user with a snapshot query portlet and having direct url entry in web cralwer with starting page as the login page action with userid, password and other form elements as parameters and values. It logs in to the home page and finds the snapshot query portlet and all pages URL in the snapshot query. It starts to crawl but it seems it hits the login page instead of the actual community page.
    It loks like in a normal browser based scenario, if i log into the portal and then delete al my cookies i too get a login page if i click any sommunity page url. The cookie is jsessionid it seems. This is true for a sso disabled exp def as well.
    can you please tell me the settings required in the www content source to login to portal and turn a self crawl to crawl all portal pages and make them full text indexed with correct names. I tried severl settings and impersonating user but could not be successful.
    PS currenlty cralwer saves all pages with Login Page (1)(2)3)...... instaed of the actual page name. i guess it takes the name from the <title> tag. but since it is not able to get into the pages and hits the login page it just saves it by that name

  • Crawling the Web with Java

    Hi everyone.
    I've been trying to learn how to make a web crawler in java following this detailed albeit old tutorial: http://www.devarticles.com/c/a/Java/Crawling-the-Web-with-Java/
    The SearchCrawler class they feature can be found in two parts:
    First part: http://www.devarticles.com/c/a/Java/Crawling-the-Web-with-Java/3/
    Second part: http://www.devarticles.com/c/a/Java/Crawling-the-Web-with-Java/4/
    I don't want to copy and paste the code because it is really long and an eyesore if viewing here.
    I get a lot of errors when compiling of which I do not understand. The majority of the errors (62 of them to be precise) are "class, interface or enum expected" errors, with the remaining few being "illegal start of type" and "<identifier> expected" errors.
    Can someone here perhaps take a look at it and compile it and see if they also get the same errors? I realise it is an old tutorial but there are hardly any detailed resources I can find for java web crawlers.
    Thanks.

    Odd I can't seem to log into my account. Never mind.
    I have used java before, the problem here I suppose is I'm not good enough to spot what's the problem. The code seems fine bracket wise and it has really left me stumped.
    If someone code put it in their editor and attempt to compile and see if I'm not the only one that's having a problem.. that would be much appreciated.
    For your convenience... the code from the example I linked:
    import java.awt.*;
    import java.awt.event.*;
    import java.io.*;
    import java.net.*;
    import java.util.*;
    import java.util.regex.*;
    import javax.swing.*;
    import javax.swing.table.*;
    // The Search Web Crawler
    public class SearchCrawler extends JFrame
      // Max URLs drop-down values.
      private static final String[] MAX_URLS =
        {"50", "100", "500", "1000"};
      // Cache of robot disallow lists.
      private HashMap disallowListCache = new HashMap();
      // Search GUI controls.
      private JTextField startTextField;
      private JComboBox maxComboBox;
      private JCheckBox limitCheckBox;
      private JTextField logTextField;
      private JTextField searchTextField;
      private JCheckBox caseCheckBox;
      private JButton searchButton;
      // Search stats GUI controls.
      private JLabel crawlingLabel2;
      private JLabel crawledLabel2;
      private JLabel toCrawlLabel2;
      private JProgressBar progressBar;
      private JLabel matchesLabel2;
      // Table listing search matches.
      private JTable table;
      // Flag for whether or not crawling is underway.
      private boolean crawling;
      // Matches log file print writer.
      private PrintWriter logFileWriter;
      // Constructor for Search Web Crawler.
      public SearchCrawler()
        // Set application title.
        setTitle("Search Crawler");
        // Set window size.
        setSize(600, 600);
         // Handle window closing events.
        addWindowListener(new WindowAdapter() {
         public void windowClosing(WindowEvent e) {
           actionExit();
        // Set up File menu.
        JMenuBar menuBar = new JMenuBar();
        JMenu fileMenu = new JMenu("File"); 
        fileMenu.setMnemonic(KeyEvent.VK_F);
        JMenuItem fileExitMenuItem = new JMenuItem("Exit",
          KeyEvent.VK_X);
        fileExitMenuItem.addActionListener(new ActionListener() {
          public void actionPerformed(ActionEvent e) { 
            actionExit();
        fileMenu.add(fileExitMenuItem);
        menuBar.add(fileMenu);
        setJMenuBar(menuBar);
        // Set up search panel.
        JPanel searchPanel = new JPanel();
        GridBagConstraints constraints;
        GridBagLayout layout = new GridBagLayout();
        searchPanel.setLayout(layout);
        JLabel startLabel = new JLabel("Start URL:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST; 
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(startLabel, constraints);
        searchPanel.add(startLabel);
        startTextField = new JTextField();
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(startTextField, constraints);
        searchPanel.add(startTextField);
        JLabel maxLabel = new JLabel("Max URLs to Crawl:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(maxLabel, constraints);
        searchPanel.add(maxLabel);
        maxComboBox = new JComboBox(MAX_URLS);
        maxComboBox.setEditable(true);
        constraints = new GridBagConstraints();
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(maxComboBox, constraints);
        searchPanel.add(maxComboBox);
        limitCheckBox =
          new JCheckBox("Limit crawling to Start URL site");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.WEST;
        constraints.insets = new Insets(0, 10, 0, 0);
        layout.setConstraints(limitCheckBox, constraints);
        searchPanel.add(limitCheckBox);
        JLabel blankLabel = new JLabel();
        constraints = new GridBagConstraints();
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        layout.setConstraints(blankLabel, constraints);
        searchPanel.add(blankLabel);
        JLabel logLabel = new JLabel("Matches Log File:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(logLabel, constraints);
        searchPanel.add(logLabel);
        String file =
          System.getProperty("user.dir") +
          System.getProperty("file.separator") +
          "crawler.log";
        logTextField = new JTextField(file);
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(logTextField, constraints);
        searchPanel.add(logTextField);
        JLabel searchLabel = new JLabel("Search String:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST; 
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(searchLabel, constraints);
        searchPanel.add(searchLabel);
        searchTextField = new JTextField();
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.insets = new Insets(5, 5, 0, 0);
        constraints.gridwidth= 2;
        constraints.weightx = 1.0d;
        layout.setConstraints(searchTextField, constraints);
        searchPanel.add(searchTextField);
        caseCheckBox = new JCheckBox("Case Sensitive");
        constraints = new GridBagConstraints();
        constraints.insets = new Insets(5, 5, 0, 5);
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        layout.setConstraints(caseCheckBox, constraints);
        searchPanel.add(caseCheckBox);
        searchButton = new JButton("Search");
        searchButton.addActionListener(new ActionListener() {
          public void actionPerformed(ActionEvent e) {
            actionSearch();
        constraints = new GridBagConstraints();
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 5, 5);
        layout.setConstraints(searchButton, constraints);
        searchPanel.add(searchButton);
        JSeparator separator = new JSeparator();
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 5, 5);
        layout.setConstraints(separator, constraints);
        searchPanel.add(separator);
        JLabel crawlingLabel1 = new JLabel("Crawling:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(crawlingLabel1, constraints);
        searchPanel.add(crawlingLabel1);
        crawlingLabel2 = new JLabel();
        crawlingLabel2.setFont(
          crawlingLabel2.getFont().deriveFont(Font.PLAIN));
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(crawlingLabel2, constraints);
        searchPanel.add(crawlingLabel2);
        JLabel crawledLabel1 = new JLabel("Crawled URLs:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(crawledLabel1, constraints);
        searchPanel.add(crawledLabel1);
        crawledLabel2 = new JLabel();
        crawledLabel2.setFont(
          crawledLabel2.getFont().deriveFont(Font.PLAIN));
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(crawledLabel2, constraints);
        searchPanel.add(crawledLabel2);
        JLabel toCrawlLabel1 = new JLabel("URLs to Crawl:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(toCrawlLabel1, constraints);
        searchPanel.add(toCrawlLabel1);
        toCrawlLabel2 = new JLabel();
        toCrawlLabel2.setFont(
          toCrawlLabel2.getFont().deriveFont(Font.PLAIN));
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(toCrawlLabel2, constraints);
        searchPanel.add(toCrawlLabel2);
        JLabel progressLabel = new JLabel("Crawling Progress:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 0, 0);
        layout.setConstraints(progressLabel, constraints);
        searchPanel.add(progressLabel);
        progressBar = new JProgressBar();
        progressBar.setMinimum(0);
        progressBar.setStringPainted(true);
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 0, 5);
        layout.setConstraints(progressBar, constraints);
        searchPanel.add(progressBar);
        JLabel matchesLabel1 = new JLabel("Search Matches:");
        constraints = new GridBagConstraints();
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 5, 10, 0);
        layout.setConstraints(matchesLabel1, constraints);
        searchPanel.add(matchesLabel1);
        matchesLabel2 = new JLabel();
        matchesLabel2.setFont(
          matchesLabel2.getFont().deriveFont(Font.PLAIN));
        constraints = new GridBagConstraints();
        constraints.fill = GridBagConstraints.HORIZONTAL;
        constraints.gridwidth = GridBagConstraints.REMAINDER;
        constraints.insets = new Insets(5, 5, 10, 5);
        layout.setConstraints(matchesLabel2, constraints);
        searchPanel.add(matchesLabel2);
        // Set up matches table.
        table =
          new JTable(new DefaultTableModel(new Object[][]{},
            new String[]{"URL"}) {
          public boolean isCellEditable(int row, int column)
            return false;
        // Set up Matches panel.
        JPanel matchesPanel = new JPanel();
        matchesPanel.setBorder(
          BorderFactory.createTitledBorder("Matches"));
        matchesPanel.setLayout(new BorderLayout());
        matchesPanel.add(new JScrollPane(table),
          BorderLayout.CENTER);
        // Add panels to display.
        getContentPane().setLayout(new BorderLayout());
        getContentPane().add(searchPanel, BorderLayout.NORTH);
        getContentPane().add(matchesPanel,BorderLayout.CENTER);
      // Exit this program.
      private void actionExit() {
        System.exit(0);
      // Handle Search/Stop button being clicked.
      private void actionSearch() {
        // If stop button clicked, turn crawling flag off.
        if (crawling) {
          crawling = false;
          return;
      ArrayList errorList = new ArrayList();
      // Validate that start URL has been entered.
      String startUrl = startTextField.getText().trim();
      if (startUrl.length() < 1) {
        errorList.add("Missing Start URL.");
      // Verify start URL.
      else if (verifyUrl(startUrl) == null) {
        errorList.add("Invalid Start URL.");
      // Validate that Max URLs is either empty or is a number.
      int maxUrls = 0;
      String max = ((String) maxComboBox.getSelectedItem()).trim();
      if (max.length() > 0) {
        try {
          maxUrls = Integer.parseInt(max);
        } catch (NumberFormatException e) {
        if (maxUrls < 1) {
          errorList.add("Invalid Max URLs value.");
      // Validate that matches log file has been entered.
      String logFile = logTextField.getText().trim();
      if (logFile.length() < 1) {
        errorList.add("Missing Matches Log File.");
      // Validate that search string has been entered.
      String searchString = searchTextField.getText().trim();
      if (searchString.length() < 1) {
        errorList.add("Missing Search String.");
      // Show errors, if any, and return.
      if (errorList.size() > 0) {
        StringBuffer message = new StringBuffer();
        // Concatenate errors into single message.
        for (int i = 0; i < errorList.size(); i++) {
          message.append(errorList.get(i));
          if (i + 1 < errorList.size()) {
            message.append("\n");
        showError(message.toString());
        return;
      // Remove "www" from start URL if present.
      startUrl = removeWwwFromUrl(startUrl);
      // Start the Search Crawler.
      search(logFile, startUrl, maxUrls, searchString);
    private void search(final String logFile, final String startUrl,
      final int maxUrls, final String searchString)
      // Start the search in a new thread.
      Thread thread = new Thread(new Runnable() {
        public void run() {
          // Show hour glass cursor while crawling is under way.
          setCursor(Cursor.getPredefinedCursor(Cursor.WAIT_CURSOR));
          // Disable search controls.
          startTextField.setEnabled(false);
          maxComboBox.setEnabled(false);
          limitCheckBox.setEnabled(false);
          logTextField.setEnabled(false);
          searchTextField.setEnabled(false);
          caseCheckBox.setEnabled(false);
          // Switch Search button to "Stop."
          searchButton.setText("Stop");
          // Reset stats.
          table.setModel(new DefaultTableModel(new Object[][]{},
            new String[]{"URL"}) {
            public boolean isCellEditable(int row, int column)
              return false;
           updateStats(startUrl, 0, 0, maxUrls);
          // Open matches log file.
          try {
            logFileWriter = new PrintWriter(new FileWriter(logFile));
          } catch (Exception e) {
            showError("Unable to open matches log file.");
            return;
          // Turn crawling flag on.
          crawling = true;
          // Perform the actual crawling.
          crawl(startUrl, maxUrls, limitCheckBox.isSelected(),
            searchString, caseCheckBox.isSelected());
          // Turn crawling flag off.
          crawling = false;
          // Close matches log file.
          try {
            logFileWriter.close();
          } catch (Exception e) {
            showError("Unable to close matches log file.");
          // Mark search as done.
          crawlingLabel2.setText("Done");
          // Enable search controls.
          startTextField.setEnabled(true);
          maxComboBox.setEnabled(true);
          limitCheckBox.setEnabled(true);
          logTextField.setEnabled(true);
          searchTextField.setEnabled(true);
          caseCheckBox.setEnabled(true);
          // Switch search button back to "Search."
          searchButton.setText("Search");
          // Return to default cursor.
          setCursor(Cursor.getDefaultCursor());
          // Show message if search string not found.
          if (table.getRowCount() == 0) {
            JOptionPane.showMessageDialog(SearchCrawler.this,
              "Your Search String was not found. Please try another.",
              "Search String Not Found",
              JOptionPane.WARNING_MESSAGE);
      thread.start();
    // Show dialog box with error message.
    private void showError(String message) {
      JOptionPane.showMessageDialog(this, message, "Error",
        JOptionPane.ERROR_MESSAGE);
    // Update crawling stats.
    private void updateStats(
      String crawling, int crawled, int toCrawl, int maxUrls)
      crawlingLabel2.setText(crawling);
      crawledLabel2.setText("" + crawled);
      toCrawlLabel2.setText("" + toCrawl);
      // Update progress bar.
      if (maxUrls == -1) {
        progressBar.setMaximum(crawled + toCrawl);
      } else {
        progressBar.setMaximum(maxUrls);
      progressBar.setValue(crawled);
      matchesLabel2.setText("" + table.getRowCount());
    // Add match to matches table and log file.
    private void addMatch(String url) {
      // Add URL to matches table.
      DefaultTableModel model =
        (DefaultTableModel) table.getModel();
      model.addRow(new Object[]{url});
      // Add URL to matches log file.
      try {
        logFileWriter.println(url);
      } catch (Exception e) {
        showError("Unable to log match.");
    // Verify URL format.
    private URL verifyUrl(String url) {
      // Only allow HTTP URLs.
      if (!url.toLowerCase().startsWith("http://"))
        return null;
      // Verify format of URL.
      URL verifiedUrl = null;
      try {
        verifiedUrl = new URL(url);
      } catch (Exception e) {
        return null;
      return verifiedUrl;
    // Check if robot is allowed to access the given URL. private boolean isRobotAllowed(URL urlToCheck) {
      String host = urlToCheck.getHost().toLowerCase();
      // Retrieve host's disallow list from cache.
      ArrayList disallowList =
        (ArrayList) disallowListCache.get(host);
      // If list is not in the cache, download and cache it.
      if (disallowList == null) {
        disallowList = new ArrayList();
        try {
          URL robotsFileUrl =
            new URL("http://" + host + "/robots.txt");
          // Open connection to robot file URL for reading.
          BufferedReader reader =
            new BufferedReader(new InputStreamReader(
              robotsFileUrl.openStream()));
          // Read robot file, creating list of disallowed paths.
          String line;
          while ((line = reader.readLine()) != null) {
            if (line.indexOf("Disallow:") == 0) {
              String disallowPath =
                line.substring("Disallow:".length());
              // Check disallow path for comments and remove if present.
              int commentIndex = disallowPath.indexOf("#");
              if (commentIndex != -1) {
                disallowPath =
                  disallowPath.substring(0, commentIndex);
              // Remove leading or trailing spaces from disallow path.
              disallowPath = disallowPath.trim();
              // Add disallow path to list.
              disallowList.add(disallowPath);
          // Add new disallow list to cache.
          disallowListCache.put(host, disallowList);
        catch (Exception e) {
          /* Assume robot is allowed since an exception
             is thrown if the robot file doesn't exist. */
          return true;
      /* Loop through disallow list to see if
         crawling is allowed for the given URL. */
      String file = urlToCheck.getFile();
      for (int i = 0; i < disallowList.size(); i++) {
        String disallow = (String) disallowList.get(i);
        if (file.startsWith(disallow)) {
          return false;
      return true;
    // Download page at given URL.
    private String downloadPage(URL pageUrl) {
      try {
        // Open connection to URL for reading.
        BufferedReader reader =
          new BufferedReader(new InputStreamReader(
            pageUrl.openStream()));
        // Read page into buffer.
        String line;
        StringBuffer pageBuffer = new StringBuffer();
        while ((line = reader.readLine()) != null) {
          pageBuffer.append(line);
        return pageBuffer.toString();
      } catch (Exception e) {
      return null;
    // Remove leading "www" from a URL's host if present.
    private String removeWwwFromUrl(String url) {
      int index = url.indexOf("://www.");
      if (index != -1) {
        return url.substring(0, index + 3) +
          url.substring(index + 7);
      return (url);
    // Parse through page contents and retrieve links.
    private ArrayList retrieveLinks(
      URL pageUrl, String pageContents, HashSet crawledList,
      boolean limitHost)
      // Compile link matching pattern.
      Pattern p =
        Pattern.compile("<a\\s+href\\s*=\\s*\"?(.*?)[\"|>]",
          Pattern.CASE_INSENSITIVE);
      Matcher m = p.matcher(pageContents);
      // Create list of link matches.
      ArrayList linkList = new ArrayList();
      while (m.find()) {
        String link = m.group(1).trim();
        // Skip empty links.
        if (link.length() < 1) {
          continue;
        // Skip links that are just page anchors.
        if (link.charAt(0) == '#') {
          continue;
        // Skip mailto links.
        if (link.indexOf("mailto:") != -1) {
          continue;
        // Skip JavaScript links.
        if (link.toLowerCase().indexOf("javascript") != -1) {
          continue;
        // Prefix absolute and relative URLs if necessary.
        if (link.indexOf("://") == -1) {
          // Handle absolute URLs.
          if (link.charAt(0) == '/') {
            link = "http://" + pageUrl.getHost() + link;
          // Handle relative URLs.
          } else {
            String file = pageUrl.getFile();
            if (file.indexOf('/') == -1) {
              link = "http://" + pageUrl.getHost() + "/" + link;
            } else {
              String path =
                file.substring(0, file.lastIndexOf('/') + 1);
              link = "http://" + pageUrl.getHost() + path + link;
        // Remove anchors from link.
        int index = link.indexOf('#');
        if (index != -1) {
          link = link.substring(0, index);
        // Remove leading "www" from URL's host if present.
        link = removeWwwFromUrl(link);
        // Verify link and skip if invalid.
        URL verifiedLink = verifyUrl(link);
        if (verifiedLink == null) {
          continue;
        /* If specified, limit links to those
          having the same host as the start URL. */
        if (limitHost &&
            !pageUrl.getHost().toLowerCase().equals(
              verifiedLink.getHost().toLowerCase())) 
          continue;
        // Skip link if it has already been crawled.
        if (crawledList.contains(link)) {
          continue;
        // Add link to list.
        linkList.add(link);
      return (linkList);
    /* Determine whether or not search string is
       matched in the given page contents. */
    private boolean searchStringMatches(
      String pageContents, String searchString,
      boolean caseSensitive)
      String searchContents = pageContents;
      /* If case-sensitive search, lowercase
         page contents for comparison. */
      if (!caseSensitive) {
        searchContents = pageContents.toLowerCase();
      // Split search string into individual terms.
      Pattern p = Pattern.compile("[\\s]+");
      String[] terms = p.split(searchString);
      // Check to see if each term matches.
      for (int i = 0; i < terms.length; i++) {
        if (caseSensitive) {
          if (searchContents.indexOf(terms) == -1) {
    return false;
    } else {
    if (searchContents.indexOf(terms[i].toLowerCase()) == -1) {
    return false;
    return true;
    // Perform the actual crawling, searching for the search string.
    public void crawl(
    String startUrl, int maxUrls, boolean limitHost,
    String searchString, boolean caseSensitive)
    // Set up crawl lists.
    HashSet crawledList = new HashSet();
    LinkedHashSet toCrawlList = new LinkedHashSet();
    // Add start URL to the to crawl list.
    toCrawlList.add(startUrl);
    /* Perform actual crawling by looping
    through the To Crawl list. */
    while (crawling && toCrawlList.size() > 0)
    /* Check to see if the max URL count has
    been reached, if it was specified.*/
    if (maxUrls != -1) {
    if (crawledList.size() == maxUrls) {
    break;
    // Get URL at bottom of the list.
    String url = (String) toCrawlList.iterator().next();
    // Remove URL from the To Crawl list.
    toCrawlList.remove(url);
    // Convert string url to URL object.
    URL verifiedUrl = verifyUrl(url);
    // Skip URL if robots are not allowed to access it.
    if (!isRobotAllowed(verifiedUrl)) {
    continue;
    // Update crawling stats.
    updateStats(url, crawledList.size(), toCrawlList.size(),
    maxUrls);
    // Add page to the crawled list.
    crawledList.add(url);
    // Download the page at the given URL.
    String pageContents = downloadPage(verifiedUrl);
    /* If the page was downloaded successfully, retrieve all its
    links and then see if it contains the search string. */
    if (pageContents != null && pageContents.length() > 0)
    // Retrieve list of valid links from page.
    ArrayList links =
    retrieveLinks(verifiedUrl, pageContents, crawledList,
    limitHost);
    // Add links to the To Crawl list.
    toCrawlList.addAll(links);
    /* Check if search string is present in
    page, and if so, record a match. */
    if (searchStringMatches(pageContents, searchString,
    caseSensitive))
    addMatch(url);
    // Update crawling stats.
    updateStats(url, crawledList.size(), toCrawlList.size(),
    maxUrls);
    // Run the Search Crawler.
    public static void main(String[] args) {
    SearchCrawler crawler = new SearchCrawler();
    crawler.show();
    }                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Indexes in 'Pending' status...

    Hi all,
    we are facing problem with our TREX 7.0. I created two indexes with relevant data assigned to them and clicked on 'Reindex' (with standard crawler set) but both of them are stuck in 'Pending' status with no files processed even several hours after.
    Might this have anything to do with access rights to the system TREX is runnig on? (separate from the box where portal is running) Where can I get some more information regarding what is happening behind all that. I suppose something must have happend...
    Any help/advice will be very much appreciated.
    Regards,
    Frank

    Bingo! Good job my friend!
    Full points for you!
    Regards,
    Frank

  • How do i optimize the code

    Here is the code for indexing.
    i want to optimize the code.
    please give me some suggestions.
    thank you
    mitesh...
    package search_engine;
    import java.util.*;
    import java.io.*;
    import java.sql.*;
    import java.awt.*;
    import javax.swing.*;
    public class Indexer implements Runnable
    public static boolean stopFlag=true;
    String dirName="c:/search/repository";
    File file=new File(dirName);
    String fList[]=file.list();
    FileReader indCountRead;
    StreamTokenizer countTok;
    public void run()
                   int fileNum;
                   try{
    FileReader indCountRead=new FileReader("c:/search/resources/indexcount.txt");
    StreamTokenizer countTok=new StreamTokenizer(indCountRead);
    countTok.resetSyntax();
                             countTok.wordChars(33,65535);
                             countTok.whitespaceChars(0,' ');
                             countTok.eolIsSignificant(false);
    countTok.nextToken();
    fileNum=Integer.parseInt(countTok.sval);
    indCountRead.close();
                        while(stopFlag && fileNum <= fList.length)
         String s="c:/search/repository/doc"+fileNum+".txt";
         FileReader fr;
                        StreamTokenizer tok;
                        Table tab;
                        try
                                  fr=new FileReader(s);
                                  tok=new StreamTokenizer(fr);
                                  tok.resetSyntax();
                                  tok.wordChars(33,65535);
                                  tok.whitespaceChars(0,' ');
                                  tok.eolIsSignificant(false);
              //FrameMain.indFrame.jTextFieldFile.setText("File:"+s);
                             tab=new Table();
                                  tok.nextToken();
         String s1=tok.sval;
                                  while(tok.nextToken()!=tok.TT_EOF)
                                       //FrameMain.indFrame.jTextFieldToken.setText("Token:"+tok.toString());
         if(tab.avoidToken(tok.sval)==false)
                                       if(tab.matchedRecord(tok.sval,s1)==false)
                                            tab.insertRecord(tok.sval,s1);
                                       else tab.updateRecord(tok.sval,s1);
                             }//while end
    }//end of try
    catch(IOException e)
    //System.out.println("File Error1"+e.getMessage());
    //FrameMain.indFrame.jTextFieldFile.setText("ERROR:"+e.getMessage());
                                  fileNum++;
                   }//end outer while     
    //IndexerFrame.jTextFieldFile.setText("");
    //FrameMain.indFrame.jTextFieldToken.setText("");
    //FrameMain.indFrame.jButtonStop.setEnabled(false);
    //FrameMain.indFrame.jButtonRun.setEnabled(false);
    //FrameMain.indFrame.jLabelWarning.setText("");
                             FileWriter indCountWrite=new FileWriter("c:/search/resources/indexcount.txt",false);
                             Integer count=new Integer(fileNum);
                             indCountWrite.write(count.toString(),0,count.toString().length());
                             indCountWrite.close();     
    if (fileNum>fList.length)
    //IndexerFrame.jLabelPop.setText("ALL FILES INDEXED !! ");
    stopFlag=true;
                   }//end of outer try
    catch(Exception e)
              //FrameMain.indFrame.jTextFieldFile.setText("ERROR:"+e.getMessage());
         }//end of run
    }//end of indexer
              class Table
                        String connectionAddress=
                             "jdbc:odbc:mitesh";
                        Connection con;
                        Statement stmt;
                        ResultSet rs;
                        public boolean avoidToken(String token)
                        if(token.equalsIgnoreCase("is")||
                        token.equalsIgnoreCase("are")||
                        token.equalsIgnoreCase("am")||
                        token.equalsIgnoreCase("was")||
                        token.equalsIgnoreCase("were")||
                        token.equalsIgnoreCase("have")||
                        token.equalsIgnoreCase("has")||
                        token.equalsIgnoreCase("had")||
                        token.equalsIgnoreCase("may")||
                        token.equalsIgnoreCase("might")||
                        token.equalsIgnoreCase("must")||
                        token.equalsIgnoreCase("shall")||
                        token.equalsIgnoreCase("will")||
                        token.equalsIgnoreCase("would")||
                        token.equalsIgnoreCase("should")||
                        token.equalsIgnoreCase("can")||
                        token.equalsIgnoreCase("could")||
                        token.equalsIgnoreCase("ought")||
                        token.equalsIgnoreCase("to")||
                        token.equalsIgnoreCase("do")||
                        token.equalsIgnoreCase("did")||
                        token.equalsIgnoreCase("does")||
                        token.equalsIgnoreCase("a")||
                        token.equalsIgnoreCase("an")||
                        token.equalsIgnoreCase("the")||
                        token.equalsIgnoreCase("in")||
                        token.equalsIgnoreCase("of")||
                        token.equalsIgnoreCase("at")||
                        token.equalsIgnoreCase("as")||
                        token.equalsIgnoreCase("into")||
                        token.equalsIgnoreCase("for")||
                        token.equalsIgnoreCase("from")||
                        token.equalsIgnoreCase("while")||
                        token.equalsIgnoreCase("if")||
                        token.equalsIgnoreCase("then")||
                        token.equalsIgnoreCase("."))
                                  return true;
                        else
                                  return false;
              public void insertRecord(String key,String fileAddress)
                             String insertString;
                             insertString="insert into INDEXER"+
                             " values('"+key.toLowerCase()+"','"+fileAddress+"',1)";
              try
                        Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");//myDriver.ClassName");
              catch(java.lang.ClassNotFoundException e)
                        System.err.print("ClassNotFoundException: ");
                        System.err.println(e.getMessage());
              try
                        con=DriverManager.getConnection(connectionAddress,"",
                        stmt=con.createStatement();
                        stmt.executeUpdate(insertString);
                        stmt.close();
                        con.close();
              catch(SQLException ex)
                        System.err.println("SQLException2: "+ex.getMessage());
              public boolean matchedRecord(String key,String fileAddress)
                        boolean flag=false;
                        String query;
                             query=" select KEYWORD,URLADDRESS from INDEXER "+
                             "where KEYWORD='"+key.toLowerCase()+"' and URLADDRESS='"+fileAddress+"'";
              try
                        Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
              catch(java.lang.ClassNotFoundException e)
                        System.err.print("ClassNotFoundException: ");
                        System.err.println(e.getMessage());
              try
                        con=DriverManager.getConnection(connectionAddress,"",
                        stmt=con.createStatement();
    rs=stmt.executeQuery(query);
                        if(rs.next()==true)
                             flag= true;
                             //System.out.println(flag);
                        else
                             flag= false;
                             //System.out.println(flag);
                        stmt.close();
                        con.close();
              catch(SQLException ex)
                        System.err.println( "SQLException3: "+ex.getMessage());
                        return flag;
              public void updateRecord(String key,String fileAddress)
                             String updateString;
                             updateString="update INDEXER"+
                             " set FREQUENCY=FREQUENCY+1" + " where KEYWORD='"+key.toLowerCase()+"' and URLADDRESS='"+fileAddress+"'";
              try
                        Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
              catch(java.lang.ClassNotFoundException e)
                        System.err.print("ClassNotFoundException: ");
                        System.err.println(e.getMessage());
              try
                        con=DriverManager.getConnection(connectionAddress,"",
                        stmt=con.createStatement();
                        stmt.executeUpdate(updateString);
                        stmt.close();
                        con.close();
              catch(SQLException ex)
                        System.err.println("SQLException4: "+ex.getMessage());
              }

    i even want to search for Html files.
    for that i have written a code for it, and we all call it as crawler.
    Please check out.
    thank you
    mitesh....
    import java.text.*;
    import java.awt.*;
    import java.awt.event.*;
    import java.sql.*;
    import java.util.*;
    import java.net.*;
    import java.io.*;
    import java.io.PrintWriter;
    public class Crawler{
    String ret;
    public static final String DISALLOW = "Disallow:";
    //public static final int SEARCH_LIMIT = 150;
    public static int fileCounter=1;
    CrawlTable tab;
         public Crawler() {
         tab=new CrawlTable();
         URLConnection.setDefaultAllowUserInteraction(false);
    Properties props= new Properties(System.getProperties());
    props.put("http.proxySet", "true");
    props.put("http.proxyHost", "webcache-cup");
    props.put("http.proxyPort", "8080");
    Properties newprops = new Properties(props);
    System.setProperties(newprops);
    String avoidHTMLTag(String s){
    StringBuffer sb=new StringBuffer();
    sb.ensureCapacity((s.length())*2);
    sb.append(s);
    int start = 0;
    int end = 0;
    try{
                             while (((start = sb.indexOf("<",start)) != -1)|((end = sb.indexOf(">",start)) != -1))
                             {          try{
                                            if(end<start)
                                                 continue;
                   sb.replace(start,end+1," ");
                   start--;
                   end=start;
                   catch( Exception ex)
                   System.out.println("ERROR:HTML FORMAT");
                             String s1=new String (sb);
         return s1;
    catch(Exception e)
    ret="WRONG HTML FORMAT";
    return "";
    }//end of htmlavoid
    public String start (String strURL)
         try
    FileReader clCountRead=new FileReader("c:/search/resources/crawlcount.txt");
    StreamTokenizer countTok=new StreamTokenizer(clCountRead);
    countTok.resetSyntax();
         countTok.wordChars(33,65535);
         countTok.whitespaceChars(0,' ');
         countTok.eolIsSignificant(false);
    countTok.nextToken();
    Crawler.fileCounter=Integer.parseInt(countTok.sval);
         clCountRead.close();
         //String strURL= CrawlerFrame.jTextFieldUrlAddress.getText();
         //out.println("CRAWLER STARTING....");
         //CrawlerFrame.jListURL.removeAll();
         int counter=0;
         boolean condition;
         URL url;
    try
              url = new URL(strURL);
              if (!tab.contains(strURL))
              // test to make sure it is robot-safe!
                   //if (robotSafe(url))
              tab.insertRecord(strURL);
    catch (MalformedURLException e)
              if(!strURL.equals("")){
    ret="ERROR: invalid URL " + strURL;
              //CrawlerFrame.jTextFieldUrlAddress.setText("");
         while(((condition=tab.isRecordFalse())||strURL.length()!=0))
                   if(condition)
                        strURL = tab.retrieveFirst();
                        //CrawlerFrame.jTextFieldUrlAddress.setText(strURL);
                        tab.updateRecord(strURL);
                        //setStatus("searching " + strURL);
                   //CrawlerFrame.jListURL.add(strURL);
                   else
                        strURL="";
                   if (strURL.length() == 0)
    //ret="Enter a starting URL then press RUN";
                   break;
              try
                   url = new URL(strURL);
              catch (MalformedURLException e)
    ret="ERROR: invalid URL " + strURL;
                   tab.delete(strURL);
                   //CrawlerFrame.jTextFieldUrlAddress.setText("");
                   strURL="";
                   continue;
              //tab.updateRecord(strURL);
              // CrawlerFrame.jListURL.add(strURL);
              // can only search http: protocol URLs
              if (url.getProtocol().compareTo("http") != 0)
                   break;
              // test to make sure it is before searching
              //if (!robotSafe(url))
                   //break;
              try
                   // try opening the URL
                   URLConnection urlConnection = url.openConnection();
                   urlConnection.setAllowUserInteraction(false);
                   InputStream urlStream = url.openStream();
                   String type
                   = URLConnection.guessContentTypeFromName(url.getFile());
                   if (type == null)
                   break;
                   if (type.compareTo("text/html") != 0)
                   break;
                   byte b[] = new byte[1000];
                   int numRead = urlStream.read(b);
                   String content = new String(b, 0, numRead);
                   while (numRead != -1)
                   //if (Thread.currentThread() != CrawlerFrame.clThread)
                        //break;
                   numRead = urlStream.read(b);
                   if (numRead != -1)
                        String newContent = new String(b, 0, numRead);
                        content += newContent;
    String fileString=content;
    fileString=fileString.replace('(',' ');
    fileString=fileString.replace(')',' ');
    fileString=fileString.replace(',',' ');
    fileString=fileString.replace('.',' ');
    fileString=fileString.replace(':',' ');
    fileString=fileString.replace('?',' ');
    fileString=fileString.replace('!',' ');
    fileString=fileString.replace('@',' ');
    fileString=fileString.replace('\'',' ');
    fileString=fileString.replace('\"',' ');
    fileString=strURL+" "+fileString;
    //fileString.replace('',' ');
    File htmlDoc=new File("c:/search/repository/doc"+fileCounter+".txt");
    FileWriter fp=new FileWriter(htmlDoc);
    fp.write(avoidHTMLTag(fileString));
    //fp.write(fileString);
    fp.close();
    fileCounter++;
                   urlStream.close();
                   //if (Thread.currentThread() != CrawlerFrame.clThread)
                   //break;
                   String lowerCaseContent = content.toLowerCase();
                   int index = 0;
                   while ((index = lowerCaseContent.indexOf("<a", index)) != -1)
                   if ((index = lowerCaseContent.indexOf("href", index)) == -1)
                        break;
                   if ((index = lowerCaseContent.indexOf("=", index)) == -1)
                        break;
                   //if (Thread.currentThread() !=CrawlerFrame.clThread)
                        //break;
                   index++;
                   String remaining = content.substring(index);
                   StringTokenizer st
                   = new StringTokenizer(remaining, "\t\n\r\">#");
                   String strLink = st.nextToken();
                   URL urlLink;
                   try
                        urlLink = new URL(url, strLink);
                        strLink = urlLink.toString();
                   catch (MalformedURLException e)
                        //setStatus("ERROR: bad URL " + strLink);
                        tab.delete(strLink);
                        //CrawlerFrame.jTextFieldUrlAddress.setText("");
                        strURL="";
                        continue;
                        if (urlLink.getProtocol().compareTo("http") != 0)
                             break;
                        //if (Thread.currentThread() != CrawlerFrame.clThread)
                             //break;
                        try
                             // try opening the URL
                             URLConnection urlLinkConnection
                             = urlLink.openConnection();
                             urlLinkConnection.setAllowUserInteraction(false);
                             InputStream linkStream = urlLink.openStream();
                             String strType
                             = urlLinkConnection.guessContentTypeFromName(urlLink.getFile());
                             linkStream.close();
                             // if another page, add to the end of search list
                             if (strType == null)
                             break;
                             if (strType.compareTo("text/html") == 0) {
                             // check to see if this URL has already been
                             // searched or is going to be searched
                             if (!tab.contains(strLink))
                                  // test to make sure it is robot-safe!
                                  //if (robotSafe(urlLink))
                                  tab.insertRecord(strLink);
                        catch (IOException e)
                             ret="ERROR: couldn't open URL " + strLink;
                             continue;
                        if (strURL.length() == 0)
                        //ret="Enter a starting URL then press RUN";
                        /////return;
                        break;
                        }//end of try
                   } catch (IOException e)
                        ret="ERROR1: couldn't open URL " + strURL;
              tab.delete(strURL);
              //CrawlerFrame.jTextFieldUrlAddress.setText("");
              strURL="";
              continue;
         }//end while
         //setStatus("done");
    //CrawlerFrame.jButtonStop.setEnabled(false);
    //CrawlerFrame.jButtonRun.setEnabled(true);
    FileWriter clCountWrite=new FileWriter("c:/search/resources/crawlcount.txt",false);
              Integer count=new Integer(fileCounter);
              clCountWrite.write(count.toString(),0,count.toString().length());
              clCountWrite.close();
    //CrawlerFrame.clThread = null;
         }//end of try
         catch(Exception e)
              ret="ERROR:"+e.getMessage();
    return(ret);
    //end of run
    }//end of classCrawler
    class CrawlTable
    String connectionAddress="jdbc:odbc:mitesh";
         Connection con;
         Statement stmt;
         ResultSet rs;
         public void insertRecord(String urlAddress)
              String insertString;
              insertString="insert into CRAWLER (URLADDRESS,ISCRAWLED)"+
              " values('"+urlAddress+"',false)";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
              stmt.executeUpdate(insertString);
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println("SQLException2: "+ex.getMessage());
    public void delete(String urlAddress)
              String deleteString;
              deleteString="delete from CRAWLER where URLADDRESS='"+urlAddress+"'";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
              stmt.executeUpdate(deleteString);
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println("SQLException2: "+ex.getMessage());
    public boolean isRecordFalse()
         boolean flag=false;
         String query;
              query=" select URLADDRESS from CRAWLER "+
              "where ISCRAWLED=false";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
    rs=stmt.executeQuery(query);
              if(rs.next()==true)
                   flag= true;
                   //System.out.println(flag);
              else
                   flag= false;
                   //System.out.println(flag);
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println( "SQLException3: "+ex.getMessage());
              return flag;
    public String retrieveFirst()
         String query,s="";
              query=" select URLADDRESS from CRAWLER where "+
              "ISCRAWLED=false order by SERIAL";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
              return null;
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
    rs=stmt.executeQuery(query);
              if(rs.next()==true)
              s=rs.getString("URLADDRESS");
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println( "SQLException3: "+ex.getMessage());
              return null;
              return s;
    public boolean contains(String strURL)
         boolean flag=false;
         String query;
              query=" select URLADDRESS from CRAWLER "+
              "where URLADDRESS='"+strURL+"'";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
    rs=stmt.executeQuery(query);
              if(rs.next()==true)
                   flag= true;
                   //System.out.println(flag);
              else
                   flag= false;
                   //System.out.println(flag);
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println( "SQLException3: "+ex.getMessage());
              return flag;
    public void updateRecord(String urlAddress)
              String updateString;
              updateString="update CRAWLER"+
              " set ISCRAWLED=true" + " where URLADDRESS='"+urlAddress+"'";
    try
              Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
    catch(java.lang.ClassNotFoundException e)
              System.err.print("ClassNotFoundException: ");
              System.err.println(e.getMessage());
    try
    con=DriverManager.getConnection(connectionAddress,"","");
              stmt=con.createStatement();
              stmt.executeUpdate(updateString);
              stmt.close();
              con.close();
    catch(SQLException ex)
              System.err.println("SQLException4: "+ex.getMessage());
    }

  • Crawler Help: Object reference not set to an instance of an object

    I'm trying to write a custom crawler and having some difficulties.  I'm getting the document information from a database.  I'm trying to have the ClickThroughURL be a web URL and the IndexingURL be a UNC path to the file on a back-end file share.  Also, I'm not using DocFetch.  The problem I'm having is that when the crawler runs I get the following error for every card:
    &#034;4/19/05 13:43:30- (940) Aborted Card creation for document: TestDoc1.  Import error: IDispatch error #19876 (0x80044fa4): [Error Importing Card.
    Error writing Indexing File.
    SOAP fault: faultcode='soap:Server' faultstring='Server was unable to process request. --> Object reference not set to an instance of an object.']&#034;
    Has anyone seen this before?  Any help you can provide would be greatly appreciated.  I have included the code from my document.vb in case that helps.
    Thanks,
    Jerry
    DOCUMENT.VB
    Imports System
    Imports Plumtree.Remote.Util
    Imports Plumtree.Remote.Crawler
    Imports System.Resources
    Imports System.Globalization
    Imports System.Threading
    Imports System.IO
    Imports System.Data.SqlClient
    Imports System.Text
    Namespace Plumtree.Remote.CWS.MoFoDocsOpen
        Public Class Document
            Implements IDocument
            Private m_logger As ICrawlerLog
            Private DocumentLocation As String
            Private d_DocumentNumber As Integer
            Private d_Library As String
            Private d_Name As String
            Private d_Author As String
            Private d_AuthorID As String
            Private d_Category As String
            Private d_ClientName As String
            Private d_ClientNumber As String
            Private d_DateCreated As DateTime
            Private d_DocumentName As String
            Private d_DocumentType As String
            Private d_EnteredBy As String
            Private d_EnteredByID As String
            Private d_FolderID As String
            Private d_KEFlag As String
            Private d_LastEdit As DateTime
            Private d_LastEditBy As String
            Private d_LastEditByID As String
            Private d_Maintainer As String
            Private d_MaintainerID As String
            Private d_MatterName As String
            Private d_MatterNumber As String
            Private d_Practice As String
            Private d_Description As String
            Private d_Version As Integer
            Private d_Path As String
            Private d_FileName As String
            Public Sub New(ByVal provider As DocumentProvider, ByVal documentLocation As String, ByVal signature As String)
                Dim location() As String = DocumentLocation.Split(&#034;||&#034;)
                Me.DocumentLocation = DocumentLocation
                Me.d_DocumentNumber = location(0)
                Me.d_Library = location(2)
                Dim objConn As New SqlConnection
                Dim objCmd As New SqlCommand
                Dim objRec As SqlDataReader
                objConn.ConnectionString = &#034;Server=sad2525;Database=PortalDocs;Uid=sa;Pwd=;&#034;
                objConn.Open()
                objCmd.CommandText = &#034;SELECT * FROM DocsOpenAggregate WHERE Library = '&#034; & Me.d_Library & &#034;' AND DocumentNumber = &#034; & Me.d_DocumentNumber
                objCmd.Connection = objConn
                objRec = objCmd.ExecuteReader()
                Do While objRec.Read() = True
                    Me.d_Name = objRec(&#034;Name&#034;)
                    Me.d_Author = objRec(&#034;Author&#034;)
                    Me.d_AuthorID = objRec(&#034;AuthorID&#034;)
                    Me.d_Category = objRec(&#034;Category&#034;)
                    Me.d_ClientName = objRec(&#034;ClientName&#034;)
                    Me.d_ClientNumber = objRec(&#034;ClientNumber&#034;)
                    Me.d_DateCreated = objRec(&#034;DateCreated&#034;)
                    Me.d_DocumentName = objRec(&#034;DocumentName&#034;)
                    Me.d_DocumentType = objRec(&#034;DocumentType&#034;)
                    Me.d_EnteredBy = objRec(&#034;EnteredBy&#034;)
                    Me.d_EnteredByID = objRec(&#034;EnteredByID&#034;)
                    Me.d_FolderID = objRec(&#034;FolderID&#034;)
                    Me.d_KEFlag = objRec(&#034;KEFlag&#034;)
                    Me.d_LastEdit = objRec(&#034;LastEdit&#034;)
                    Me.d_LastEditBy = objRec(&#034;LastEditBy&#034;)
                    Me.d_LastEditByID = objRec(&#034;LastEditByID&#034;)
                    Me.d_Maintainer = objRec(&#034;Maintainer&#034;)
                    Me.d_MaintainerID = objRec(&#034;MaintainerID&#034;)
                    Me.d_MatterName = objRec(&#034;MatterName&#034;)
                    Me.d_MatterNumber = objRec(&#034;MatterNumber&#034;)
                    Me.d_Practice = objRec(&#034;Practice&#034;)
                    Me.d_Description = objRec(&#034;Description&#034;)
                    Me.d_Version = objRec(&#034;Version&#034;)
                    Me.d_Path = objRec(&#034;Path&#034;)
                    Me.d_FileName = objRec(&#034;FileName&#034;)
                Loop
                objCmd = Nothing
                If objRec.IsClosed = False Then objRec.Close()
                objRec = Nothing
                If objConn.State <> ConnectionState.Closed Then objConn.Close()
                objConn = Nothing
            End Sub
            'If using DocFetch, this method returns a file path to the document in the backend repository.
            Public Function GetDocument() As String Implements IDocument.GetDocument
                m_logger.Log(&#034;Document.GetDocument called for &#034; & Me.DocumentLocation)
                Return Me.d_Path
            End Function
            'Returns the metadata information about this document.
            Public Function GetMetaData(ByVal aFilter() As String) As DocumentMetaData Implements IDocument.GetMetaData
                m_logger.Log(&#034;Document.GetMetaData called for &#034; & DocumentLocation)
                Dim DOnvp(23) As NamedValue
                DOnvp(0) = New NamedValue(&#034;Author&#034;, Me.d_Author)
                DOnvp(1) = New NamedValue(&#034;AuthorID&#034;, Me.d_AuthorID)
                DOnvp(2) = New NamedValue(&#034;Category&#034;, Me.d_Category)
                DOnvp(3) = New NamedValue(&#034;ClientName&#034;, Me.d_ClientName)
                DOnvp(4) = New NamedValue(&#034;ClientNumber&#034;, Me.d_ClientNumber)
                DOnvp(5) = New NamedValue(&#034;DateCreated&#034;, Me.d_DateCreated)
                DOnvp(6) = New NamedValue(&#034;DocumentName&#034;, Me.d_DocumentName)
                DOnvp(7) = New NamedValue(&#034;DocumentType&#034;, Me.d_DocumentType)
                DOnvp(8) = New NamedValue(&#034;EnteredBy&#034;, Me.d_EnteredBy)
                DOnvp(9) = New NamedValue(&#034;EnteredByID&#034;, Me.d_EnteredByID)
                DOnvp(10) = New NamedValue(&#034;FolderID&#034;, Me.d_FolderID)
                DOnvp(11) = New NamedValue(&#034;KEFlag&#034;, Me.d_KEFlag)
                DOnvp(12) = New NamedValue(&#034;LastEdit&#034;, Me.d_LastEdit)
                DOnvp(13) = New NamedValue(&#034;LastEditBy&#034;, Me.d_LastEditBy)
                DOnvp(14) = New NamedValue(&#034;LastEditByID&#034;, Me.d_LastEditByID)
                DOnvp(15) = New NamedValue(&#034;Maintainer&#034;, Me.d_Maintainer)
                DOnvp(16) = New NamedValue(&#034;MaintainerID&#034;, Me.d_MaintainerID)
                DOnvp(17) = New NamedValue(&#034;MatterName&#034;, Me.d_MatterName)
                DOnvp(18) = New NamedValue(&#034;MatterNumber&#034;, Me.d_MatterNumber)
                DOnvp(19) = New NamedValue(&#034;Practice&#034;, Me.d_Practice)
                DOnvp(20) = New NamedValue(&#034;Description&#034;, Me.d_Description)
                DOnvp(21) = New NamedValue(&#034;Version&#034;, Me.d_Version)
                DOnvp(22) = New NamedValue(&#034;Path&#034;, Me.d_Path)
                DOnvp(23) = New NamedValue(&#034;FileName&#034;, Me.d_FileName)
                Dim metaData As New DocumentMetaData(DOnvp)
                Dim strExt As String = Right(Me.d_FileName, Len(Me.d_FileName) - InStrRev(Me.d_FileName, &#034;.&#034;))
                Select Case LCase(strExt)
                    Case &#034;xml&#034;
                        metaData.ContentType = &#034;text/xml&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;vsd&#034;
                        metaData.ContentType = &#034;application/vnd.visio&#034;
                        metaData.ImageUUID = &#034;{2CEEC472-7CF0-11d3-BB3A-00105ACE365C}&#034;
                    Case &#034;mpp&#034;
                        metaData.ContentType = &#034;application/vnd.ms-project&#034;
                        metaData.ImageUUID = &#034;{8D6D9F50-D512-11d3-8DB0-00C04FF44474}&#034;
                    Case &#034;pdf&#034;
                        metaData.ContentType = &#034;application/pdf&#034;
                        metaData.ImageUUID = &#034;{64FED895-D031-11D2-8909-006008168DE5}&#034;
                    Case &#034;doc&#034;
                        metaData.ContentType = &#034;application/msword&#034;
                        metaData.ImageUUID = &#034;{0C35DD71-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;dot&#034;
                        metaData.ContentType = &#034;application/msword&#034;
                        metaData.ImageUUID = &#034;{0C35DD71-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;rtf&#034;
                        metaData.ContentType = &#034;text/richtext&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;xls&#034;
                        metaData.ContentType = &#034;application/vnd.ms-excel&#034;
                        metaData.ImageUUID = &#034;{0C35DD72-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;xlt&#034;
                        metaData.ContentType = &#034;application/vnd.ms-excel&#034;
                        metaData.ImageUUID = &#034;{0C35DD72-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;pps&#034;
                        metaData.ContentType = &#034;application/vnd.ms-powerpoint&#034;
                        metaData.ImageUUID = &#034;{0C35DD73-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;ppt&#034;
                        metaData.ContentType = &#034;application/vnd.ms-powerpoint&#034;
                        metaData.ImageUUID = &#034;{0C35DD73-6453-11D2-88C3-006008168DE5}&#034;
                    Case &#034;htm&#034;
                        metaData.ContentType = &#034;text/html&#034;
                        metaData.ImageUUID = &#034;{D2E2D5E0-84C9-11D2-A0C5-0060979C42D8}&#034;
                    Case &#034;html&#034;
                        metaData.ContentType = &#034;text/html&#034;
                        metaData.ImageUUID = &#034;{D2E2D5E0-84C9-11D2-A0C5-0060979C42D8}&#034;
                    Case &#034;asp&#034;
                        metaData.ContentType = &#034;text/plain&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;idq&#034;
                        metaData.ContentType = &#034;text/plain&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;txt&#034;
                        metaData.ContentType = &#034;text/plain&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;log&#034;
                        metaData.ContentType = &#034;text/plain&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case &#034;sql&#034;
                        metaData.ContentType = &#034;text/plain&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                    Case Else
                        metaData.ContentType = &#034;application/octet-stream&#034;
                        metaData.ImageUUID = &#034;{F8F6B82F-53C6-11D2-88B7-006008168DE5}&#034;
                End Select
                metaData.Name = Me.d_Name
                metaData.Description = Me.d_Description
                metaData.FileName = Me.d_FileName ' This is a file name - for example &#034;2jd005_.DOC&#034;
                metaData.IndexingURL = Me.d_Path ' This is a file path - for example &#034;\\fileserver01\docsd$\SF01\DOCS\MLS1\NONE\2jd005_.DOC&#034;
                metaData.ClickThroughURL = &#034;http://mofoweb/docsopen.asp?Unique=&#034; & HttpUtility.HtmlEncode(Me.DocumentLocation)
                metaData.UseDocFetch = False
                Return metaData
            End Function
            'Returns the signature or last-modified-date of this document that indicates to the portal whether the document needs refreshing.
            Public Function GetDocumentSignature() As String Implements IDocument.GetDocumentSignature
                Dim SigString As New StringBuilder
                Dim SigEncode As String
                SigString.Append(Me.d_DocumentNumber & &#034;||&#034;)
                SigString.Append(Me.d_Library & &#034;||&#034;)
                SigString.Append(Me.d_Name & &#034;||&#034;)
                SigString.Append(Me.d_Author & &#034;||&#034;)
                SigString.Append(Me.d_AuthorID & &#034;||&#034;)
                SigString.Append(Me.d_Category & &#034;||&#034;)
                SigString.Append(Me.d_ClientName & &#034;||&#034;)
                SigString.Append(Me.d_ClientNumber & &#034;||&#034;)
                SigString.Append(Me.d_DateCreated & &#034;||&#034;)
                SigString.Append(Me.d_DocumentName & &#034;||&#034;)
                SigString.Append(Me.d_DocumentType & &#034;||&#034;)
                SigString.Append(Me.d_EnteredBy & &#034;||&#034;)
                SigString.Append(Me.d_EnteredByID & &#034;||&#034;)
                SigString.Append(Me.d_FolderID & &#034;||&#034;)
                SigString.Append(Me.d_KEFlag & &#034;||&#034;)
                SigString.Append(Me.d_LastEdit & &#034;||&#034;)
                SigString.Append(Me.d_LastEditBy & &#034;||&#034;)
                SigString.Append(Me.d_LastEditByID & &#034;||&#034;)
                SigString.Append(Me.d_Maintainer & &#034;||&#034;)
                SigString.Append(Me.d_MaintainerID & &#034;||&#034;)
                SigString.Append(Me.d_MatterName & &#034;||&#034;)
                SigString.Append(Me.d_MatterNumber & &#034;||&#034;)
                SigString.Append(Me.d_Practice & &#034;||&#034;)
                SigString.Append(Me.d_Description & &#034;||&#034;)
                SigString.Append(Me.d_Version & &#034;||&#034;)
                SigString.Append(Me.d_Path & &#034;||&#034;)
                SigString.Append(Me.d_FileName & &#034;||&#034;)
                Dim encoding As New UTF8Encoding
                Dim byteArray As Byte() = encoding.GetBytes(SigString.ToString())
                SigEncode = System.Convert.ToBase64String(byteArray, 0, byteArray.Length)
                Return SigEncode
            End Function
            'Returns an array of the users with access to this document.
            Public Function GetUsers() As ACLEntry() Implements IDocument.GetUsers
                'no acl info retrieved
                Dim aclArray(-1) As ACLEntry
                Return aclArray
            End Function
            'Returns an array of the groups with access to this document.
            Public Function GetGroups() As ACLEntry() Implements IDocument.GetGroups
                'no acl info retrieved
                Dim aclArray(-1) As ACLEntry
                Return aclArray
            End Function
        End Class
    End Namespace

    1. I don't think you can just set the index url to a unc path.
    2. Try creating an index aspx page. In your MetaData.IndexURL set it to the index aspx page, and include query string params for the encoded unc path as well as the content type.
    3. In the index servlet, get the content type and path from the query string
    4. Get the filename from the file path
    5. Set the headers for content-type and Content-Disposition, e.g.
    Response.ContentType="application/msword"
    Response.AddHeader("Content-Disposition", "inline; filename'" + filename)
    6. Stream out the file:
    FileStream fs = new FileStream(path, FileMode.Open)
    byte[] buffer = new byte[40000]
    int result
    System.IO.Stream output = Response.OutputStream
    do
    result = fs.Read(buffer, 0, 40000)
    output.Write(buffer, 0, result)
    while (result == 40000)
    put the above in a try-catch, and then delete the temp file in the finally block.
    If this does not help, set a breakpoint in the code to find the error. Also use Log4Net to log any errors.

  • How to set Full Crawl Schedule as None via Powershell command

    How to set Full Crawl Schedule as None via Powershell command

    $ssa = "Search Service Application"
    $contentSource = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $ssa -Identity "Local SharePoint Sites"
    $contentSource.IncrementalCrawlSchedule = $null
    $contentSource.FullCrawlSchedule = $null
    $contentSource.Update()
    Basically you set Schedule to Null.
    Amit

  • Why is my type jagged in crawl effect setting?

    I'm pretty new to FinalCut Express, and I'm making a little presentation video. The only problem is that when I set type to scroll, crawl, whatever across the bottom of the screen, it looks like crap. All the edges are ragged with little horizontal lines. The static type looks just fine. Am I doing something wrong here? Is there a way to fix this? I keep looking around, but there doesn't seem to be anything I've found that mentions this problem and how to remedy it. I'd appreciate any help on this. Thanks!!

    One way I found to somewhat improve the quality was to make the text sequence in Live Type much much longer than needed and then slow it down in FCE. There will probably be those who shudder at this but it works. Also, Live Type does not work via vectored graphics - not surprising really.

  • The account password was not specified. Error after setting up a second Crawl Database and Crawler on different server.

    We have a three server farm. (SPWebTest, SPAppTest, SPDBTest)
    We have added an additional database server (SPDBRMTest) and an additional application server (SPAPPRMTest).
    Today I created a new Crawl Database on SPDBRMTest and a new Crawler on SPAPPRMTest.
    I create a distribution rule to all crawling activity for 1 web application to the new crawl database on SPDBRMTest.
    This web application was part of the original crawl and had no errors or issues. We are trying to scale our Search to improve performance but when a full crawl is executed against this content source I get the following crawl error:
    "The account password was not specified. Specify the password."
    I have tried re-entering the "Default Content Access Account" but the issue continues.

    Hi Brian,
    when you add the crawl rules, do the account that is provided have the permission to read the content at the web application?
    http://technet.microsoft.com/en-us/library/jj219686(v=office.15).aspx
    please disable the loopback check, perhaps it may help:
    http://support.microsoft.com/kb/896861
    Regards,
    Aries
    Microsoft Online Community Support
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

  • I have an Apple MacBook Pro and when surfing the web my computer will slow to a crawl and I will have a multi-colored spinning wheel visible until my latest request is handled.  What is causing this and is there a way to prevent this from occuring

    I have a MacBook Pro.  When surfing the web it will eventually slow to a crawl.  When this occurs, there will be a small multi-colored wheel spinning until my latest command is handled.  What is causing this and is there a way that I can modify or prevent this from happening?  Is there a setting that will prevent this?

    When you next have the problem, note the exact time: hour, minute, second.
    If you have more than one user account, these instructions must be carried out as an administrator.
    Launch the Console application in any of the following ways:
    ☞ Enter the first few letters of its name into a Spotlight search. Select it in the results (it should be at the top.)
    ☞ In the Finder, select Go ▹ Utilities from the menu bar, or press the key combination shift-command-U. The application is in the folder that opens.
    ☞ Open LaunchPad. Click Utilities, then Console in the icon grid.
    Make sure the title of the Console window is All Messages. If it isn't, select All Messages from the SYSTEM LOG QUERIES menu on the left. If you don't see that menu, select
    View ▹ Show Log List
    from the menu bar.
    Scroll back in the log to the time you noted above. Select any messages timestamped from then until the end of the episode, or until they start to repeat. Copy them to the Clipboard (command-C). Paste into a reply to this message (command-V).
    When posting a log extract, be selective. In most cases, a few dozen lines are more than enough.
    Please do not indiscriminately dump thousands of lines from the log into this discussion.
    Important: Some private information, such as your name, may appear in the log. Anonymize before posting.

  • Not able to crawl all items from External content type

    Hello All,
    "All the records in my external content source are not getting crawled, only 1/3 rd of the data are getting crawled."
    Steps:
    I created "External content type" using sharepoint designer which connects to a SQL Server database.
    Have written a SQL View joining 2 tables, which return 9,00,000 rows when executed using the SQL Server management studio.
    I used the default "Business Data Connectivity Service" and "Search Service Application" and made sure the necessary permissions are set.
    Created a External Content source for the search service application and selected the "Business Data Connectivity Service" -> "Crawl selected External datasource" -> <my external datasource i created in sharepoint designer>
    Issue
    When i ran the full crawl for the first time it crawled "3,49,923" records in 01 hour and 07 seconds. And returned 1 error "Error crawling LOB Contents.(Error caused by exception: System.InvalidOperationException. There is an error in XML
    document...)
    Later i removed the below item from the index and started a full recrawl, this time it crawled "3,49,924" records 1 record extra from my previous crawl
    and no errors were returned.
    Please let me know what could be the issue. It doesn't look like the permission related issues as i am able to crawl the 1/3rd of my total data. Also i am able to search the crawled data. I also set the throtteling limit for the "Business data catalog"
    to -maximum 10000000 -default 1000000 which is less than the data it has to crawl.
    SRIRAM

    Hi ,
    I started the change suggested in the link shared by you, but got stuck at a point,
    The field which i set as identifier in BCS earlier is not having unique values. Total rows returned by the sql view is 899000, but the unique values present in the column that is set as identifier is
    3,49,923, which is equal to the number of rows crawled. - Is this the reason why it didnt crawl all records?
    The table that is used in the sql view has composite key, - Is it possible to have multiple identifier in BCS as well?
    Is it possible to make BCS to ignore the identifier? i mean creating BCS without an identifier column?
    Please let me know your suggestions on this.
    Thanks,
    SRIRAM
    Yes, BCS needs a UID. This is so it can figure out changes over time to a single record. Otherwise all changes to a row could be a potential new row without BCS knowing any better.
    Yes, or just have it run off the composite key instead of the field you're using now.
    Nope, BCS needs a UID field like in answer 1 :)
    Good luck!
    My CodePlex -
    My Blog - My Twitter
    Join me at the San Francisco SharePoint User Group!
    If this post helped you or answered your question please remember to mark it! :)

  • Downloads for different products e.g., Crawler Web Security won't take

    I attempted to download different toolbars and products such as from Crawler's Spyware and Web Security without any luck. It goes through the usual set-up windows and a window opens up to say that the downloads have been successfully completed. I cannot find them!

    Does the ext directory have the php_oci8.dll? In the original steps the PHP dir is renamed. In the given php.in the extension_dir looks like it has been updated correctly. Since PHP distributes php_oci8.dll by default I reckon there would be a very good chance that the problem was somewhere else. Since this is an old thread I don't think we'll get much value from speculation.
    -- cj

  • Why has my native windows 7 Pro-64 network to a windows 7 Pro-64 VM has slowed to a crawl and freezes, after the new Arris cable modem was installed?

    Please appreciate the details, so the complete situation can be visualized:
    The office network:
    New Mac Mini PC running Win 7Pro 64, (in boot camp), networked to an iMac running the same Win 7 Pro 64, VM, UNDER Fusion 7Pro.
    To review: My treatment rooms former dell PC/running vista business/ running the server and network versions of my Dental practice mgmt & X-ray capture apps, had no probs WLAN to the front desk iMac running the same PC client dental network apps under the same vista business, & Fusion 4. The dell slowly fried its power supply .
    I could not wait to replace it with a Mac mini/ boot camp/ win7 pro 64 bit, as it has to run natively to capture X-rays. The front desk iMac was also upgraded to os10.10.3/ Fusion 7pro/ and its own matching Win7 pro 64bit. The iMac is wired and WLAN connected to a netgear d600 router and a Time Warner Cable  modem. The modem then failed and was replaced by an arris modem. As the Arris was configured out of the box, the network ran much slower slowed, as compared to the original modem. BUT-  The Vonage 2 line phone system stopped working- no incoming or outgoing voices could be heard after it rang. Then twc remote configured it and the phone now works, but the network iMac slowed to a crawl, and now freezes when accessing the Mac Mini PC dental server to edit or enter data. The Mini perfectly accesses the internet via wifi. It accesses the front desk canon  printer by wifi, and can print share they the iMac that is wired to the printer, as well, BUT, the network connection is a spinning pinwheel. The front desk imac easily finds the Mac Mini PC server on the network. It connects easily by entering the servers IPv4 address SHOULD THAT last value change.  But if data isn't entered in slow motion, the window and app freezes, and has to be force quit by Task Manager!
    Cause:
    1. I think TWC switched the cable modem from the default- Non Bridged, to Bridged mode. Could this be the cause?
    2. The network setting in fusion, on the iMac is auto detect, a setting that always worked with vista and with the new Win7 system, before TWC installed the new arris modem. 3. Switching to NAT does not work at all is there a different setting to use?
    3. Does the Netgear Router  have to have a port or pref configured so as not to block the  Server.
    4. might in need an Access Point (I have an extra netgear wdr3400) ).
    5. Do I need an Extender that plugs into the AC LINE?
    I hope this info helps.
    PS:(some addtl info:
    1. File sharing works fine between the minis -Mac and boot camp sides and between the iMacs- Mac and VM sides and between eother side to or from either of the 2 computers! 2. Also  either office computer and all of my my home macs , (& my iPad, & iPhone), can access each other instantly and flawlessly, with splash top streamer and jump desktop remote apps, as controller or as client!  In fact, the front desk iMac can remote control the mini PC AS IF YOU WERE DIRECTLY ENTERING INFO I TO IT!
    SO I think it's either a misconfiguration in the TWC cable modem  &/or some additional pref to config in the router? Thanks so much for your help, (been using VM's since VPC1. & Win95 in a iMac, 1999, DV MODEL! (TWC BLAMES VONAGE FOR NOT OPENING SOME PORTS IN THE TELE MODEM. Vonage blames TWC FOR NOTopeningj ITS PORTS IN THE CABLE MODEM, when they meant the ports on the Netgear.  They have passed the buck for 6 weeks with not even a guess at what to do. VERY FRUSTRATING!  Again, thanks to all!

    Okay Saint Steven, I'll bite:
    I read your description multiple times, very carefully, and I am still confused. May I suggest the following:
    * Follow basic troubleshooting procedures. Isolate each individual problem. Don't try to get everything working at once. Make one change at a time.
    * If I read your description correctly, the Netgear D600 (??) Router is the office LAN (not WLAN). Connect each piece on the LAN separately one-by-one, and test each piece individually. Get your LAN and its devices working first. Forget about the Vonage and the Arris Cable modem.
    * Next, look at your Internet connection. Again, try each piece separately. Depending on how the original cable modem and the new Arris cable modem are configured, you may also have to adjust the configuration of the Netgear D600 Router. 
    * Finally, if you are still having problems, try to organize your thoughts and presentation before you post again. List each element in your network. List how each element is configured. Be systematic.
    "Arris cable modem" does not help much - some are simple bridges, others are routers, etc.
    To answer your questions:
    1. I can't tell from your confusing description, but I don't think so. It seems like the problems are associated with the system changes you made (replacing Dell, etc.) and the problems seem to be isolated to the local LAN.
    2. Auto detect should work, but it is easy to change and test, as long as you change one thing at a time.
    3. No. The Netgear router should allow all communications on the local LAN. Could you have put the "server" in the DMZ?
    Note: I could not find a Netgear "D600" model in a web search. Do you mean N600?
    4. No. I doubt that you need an additional access point. Based on your description, everything appears to connect to the LAN.
    5. No. See 4.
    To repeat myself:
    * Do careful, systematic troubleshooting.
    * Provide carefully written, detailed description of your network and system configurations when you ask for help.
    Good luck!

Maybe you are looking for

  • How to track Inter Company Asset Retirements

    Hi friends, We have a unique requirement. As opposed to normal inter-company asset transfers in SAP via ABT1N, we would like to retire the asset to an external company code. This would reflect as a retirement in the asset register, and not as a trans

  • Check the range of values of an internal table with the statement IN

    Hi, I'd like to know how to check if the current contents of a field are in the set described by an internal table. I know I have to use the statement <b>IN</b>, but I don't know exactly how to define the internal table. In my case, I have a single f

  • How does OOB breadcrumbs work for multisite environment

    Hi I am currently using CatalogNavHistory and CatalogNavHistoryCollector to display breadcrumbs in my catalog pages. My doubt is with ATG10.1 we are using multisite environment for our project. So one category will be part of more than one site. So h

  • Java.lang.SecurityException: Authentication for user system denied in realm wl_realm Error.

              Getting this security exception when trying to pull a message from one weblogic           instance JMS queue, and sending the message (via a MDB) to another machine's JMS           queue.           Ex. Here's the scenario.           Two Win

  • I have photoshop installed on my external drive.

    I had put photoshop on my old computer from the external drive so i didn't have to carry that big thing around. But now...I can't seem to install photoshop to my new computer. it won't even run. It says something about Error: 1. Help?