namazu-1.3.0.10

What is Namazu?

Namazu is a search engine software intended for easy use. Not only it works as CGI program for small or medium scale WWW search engine, but also works as personal use such as search system for local HDD. Now, search clients for Mule and Tcl/Tk, JAVA and Win32 are available.

Installation

Download namazu-1.3.0.10.tar.gz from http://openlab.ring.gr.jp/namazu/index.html.en
# cp namazu-1.3.0.10.tar.gz /usr/local/src/
# cd /usr/local/src
# tar zxvf namazu-1.3.0.6.tar.gz
# cd namazu-1.3.0.10/src
# ./configure --prefix=/usr/local --with-cgi-dir=/usr/local/apache/cgi-bin --with-admin=webmaster@hoge.bt --with-perl5=/usr/bin/perl --without-japanese

loading cache ./config.cache
checking for gcc... gcc
checking whether the C compiler (gcc  ) works... yes
checking whether the C compiler (gcc  ) is a cross-compiler... no
checking whether we are using GNU C... yes
checking whether gcc accepts -g... yes
checking for main in -lm... yes
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for fcntl.h... yes
checking for unistd.h... yes
checking for working const... yes
checking for pid_t... yes
checking for size_t... yes
checking for working alloca.h... yes
checking for alloca... yes
checking for 8-bit clean memcmp... yes
checking for re_comp... yes
checking for memmove... yes
webmaster's email address is set to webmaster@hoge.bt
using /usr/bin/ for perl5
./configure: /usr/bin/: is a directory
checking for zcat... /bin/zcat
checking for jgroff... no
checking for groff... /usr/bin/groff
updating cache ./config.cache
creating ./config.status
creating Makefile
  • Check Makefile
    BASEDIR         = /usr/local
    CGIDIR          = /usr/local/apache/cgi-bin
    NAMAZUDIR       = $(BASEDIR)/namazu
    BINDIR_SYS      = $(BASEDIR)/bin  # install 'namazu' 'mknmz' command.
    BINDIR          = $(NAMAZUDIR)/bin
    INDEXDIR        = $(NAMAZUDIR)/index
    DOCDIR          = $(NAMAZUDIR)/doc
    LIBDIR          = $(NAMAZUDIR)/lib
    CONTRIBDIR      = $(NAMAZUDIR)/contrib
    
    ...
    
    OPT_NAMAZU_CONF         = $(LIBDIR)/namazu.conf
    
    ...
    
    OPT_PATH_PERL           = /usr/bin/perl
    OPT_PATH_NKF            =
    OPT_PATH_KAKASI         =
    OPT_PATH_CHASEN         =
    
    OPT_ADMIN_EMAIL         = webmaster@hyperdyne.co.jp
    ...
    OPT_HTDOCUMENT_ROOT     = /usr/local/apache/htdocs
    OPT_HTDOCUMENT_ROOT_URL_PREFIX  = http://www.hoge.bt/
    ...
    OPT_TARGET_FILE = .*\\.html?|.*\\.txt|.*\\.shtml|.*_default
    
  • Compile and Install
    # make
    rm -f mknmz
    sed -e 's!%OPT_PATH_PERL%!/usr/bin/!g' \
        -e 's!%OPT_SYSTEM%!UNIX!g' \
        -e 's!%OPT_PATH_NKF%!!g' \
        -e 's!%OPT_PATH_KAKASI%!!g' \
        -e 's!%OPT_PATH_CHASEN%!!g' \
    ...
    gcc namazu.o codeconv.o messages.o parser.o cgi.o wakati.o conf.o hlist.o output
    .o search.o values.o form.o re_match.o regex.o util.o seed.o -lm -o namazu
    cp namazu namazu.cgi
    
    # make install
    if [ ! -d /usr/local/namazu/bin ]; then \
        mkdir -p /usr/local/namazu/bin; \
    fi
    ...
    cp ../lib/* /usr/local/namazu/lib
    cp ../contrib/* /usr/local/namazu/contrib
    cp namazu mknmz /usr/local/bin
    
  • Edit /usr/local/bin/mknmz
    Comment out all $WAKATI and $MorphOpt valiables lines

    Making Index

    Namazu should make the index like a dictionary in advance of searching. a command `mknmz' do this task. Basic way is like this:

    # cd /usr/local/namazu/index/
    # mknmz /usr/local/apache/htdocs/
    
    You must specify the /target/directory which contains files you want to index. If /target/directory has subdirectories, `mknmz' recursively traverse them and do indexing.

    After this processing is finished, `mknmz' makes files named ' NMZ.*' as index in current directory.

  • mknmz command usage
          -a: target all files
          -c: use ChaSenI as Japanese processor
          -e: exclude files which has robot exclusion
          -h: treat header part of Mail/News well
          -k: use KAKASI as Japanese processor
          -m: use ChaSenI as Japanese processor with morphological processing
          -q: suppress status messages during execution
          -r: treat man files
          -u: decode uuencoded part and discard BinHex part
          -x: do not make summary with structure of HTML's headings
          -D: do not insert headers such as 'Date:' to summary (default: off)
          -E: delete symbols on edge of word (default: off)
          -G: delete Okurigana in word (default: off)
          -H: ignore words consist of Hiragana only (default: off)
          -K: delete all symbols (default: off)
          -L: do not adjust beginning and end of line (default: off)
          -M: do not do special processing for MHonArc (default: off)
          -P: do not make the index for phrase search (default: off)
          -R: do not make the index for regexp search (default: off)
          -U: do not encode URL (default: off)
          -W: do not make the index for sort by date (default: off)
          -X: do not make the index for field search (default: off)
          -Y: do not detect deleted documents (default: off)
          -Z: do not detect update and deleted documents (default: off)
          -A: exclude files restricted by .htaccess
          -l (lang): specify the language ('en' or 'ja', default:en)
          -I (file): include user defined file in advance of index processing
          -F (file): load a file which contains list of target files
          -O (dir) : specify a directory to output the index
          -T (dir) : specify a directory where NMZ.{head,foot,body}.* are
          -t (regex): specify a regex for target files
    

    Automatic making index

  • Make /etc/cron.daily/namazu.cron
    #!/bin/bash
    cd /usr/local/namazu/index/
    mknmz -q /usr/local/apache/htdocs/
    cd -
    
  • Add permittion
    # chmod 755 namazu.cron
    

    Searching

    A command `namazu' is a search engine. Basic usage is like this:
    % namazu "test"
    
  • namazu command usage
      Copyright (C) 1997-1999 Satoru Takabayashi All rights reserved.
      Search Program of Namazu v1.3.0.10
    
      usage: namazu [options]  [index dir(s)]
         -n (num)  : set number of documents shown at once.
         -w (num)  : set first number of documents shown.
         -s        : output by short format.
         -S        : output by more short format (simple listing).
         -v        : print this help and exit.
         -f (file) : set pathname of namazu.conf.
         -h        : output by HTML format.
         -l        : sort documents in reverse order.
         -e        : sort documents in normal order.
         -a        : output all documents.
         -c        : output only hit conunts
         -r        : do not display reference hit counts.
         -o (file) : set output file name.
         -C        : print current configuration.
         -H        : output further result link (nearly meaningless) .
         -F        : force 
    ...
    region to output. -R : do not replace URL string. -U : do not decode URL encode when plain text output. -L (lang) : set output language (ja or en)

    Configure

  • Edit /usr/local/namazu/lib/namazu.conf (copy from namazu.conf-dist)
    # This is the Namazu configuration file.
    #   originally, this file is named 'namazu.conf-dist'. so you should
    #   copy this to 'namazu.conf' to use.
    #
    #   item and value are MUST be separated with TAB character.
    #   see "manual.html#NAMAZU_CONF" for detailed information.
    #     <Directive List>
    #       * INDEX   : Pathname where index file (NMZ.*) is placed.
    #       * REPLACE : Replace URL string for search result output.
    #                   describe by TARGET, REPLACEMENT order.
    #                   if you do not want to do this replacement in command
    #                   line use, you can run 'namazu' with -U option and
    #                   avoid this processing.
    #       * BASE    : append <BASE HREF="..."> to search result HTML.
    #                   this value must terminate with '/' or '\' character.
    #       * LOGGING : set OFF to turn off search keyword logging.
    #                   default: logging ON (to NMZ.slog)
    #       * LANG    : set language code registrated in ISO 639
    #                   such as `ja', `en', `de', and etc.
    #                   if you set 'de' to this, namazu would use
    #                   NMZ.(head|foot|body|msg).de as message files.
    #       * SCORING : set scoring method TFIDF or SIMPLE.
    #
    #INDEX           /usr/local/namazu/index
    REPLACE /usr/local/apache/htdocs/      http://www.hoge.bt/
    #BASE            file://localhost/home/foo/documents/
    #LOGGING OFF
    LANG            en
    #SCORING TFIDF
    
  • Edit and customize following there files for search page
    1. /usr/local/namazu/index/NMZ.head.en
    2. /usr/local/namazu/index/NMZ.body.en
    3. /usr/local/namazu/index/NMZ.foot.en
  • Copy namazu.cgi
    # cp /usr/local/src/namazu-1.3.0.10/src/namazu.cgi /usr/local/apache/cgi-bin
    
  • Open search page (http://hostname/cgi-bin/namazu.cgi) by browser


    Back
    Google
    Web www.grape-info.com