Last updated 11 Feb 2004
Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files
You can start a document management system in your enterprise. It is powerful tool to index shared documents, as MS Word, MS Excel, PDF file etc, and search from a web page to share information amongst working group. Individual files are isolated and not reusable unless there is an index to reach those files. Think about the Internet, the Internet is useless if there is no search engine.
This Namazu may eliminate database from your organization, because you do not need to enter data into database manually. The only thing is to keep all documents in one location, and index it regularly (daily).
(The Japanese word `Namazu' means `catfish' in English.)
Home Page| Namazu 2.0.12 for Win32 | nmz2012.exe |
|---|---|
| Active Perl | ActivePerl-5.6.1.635-MSWin32-x86.msi (download build 600 series, not 800 series) |


echo off REM ---- Contents directory ---- SET D_CT=C:\Inetpub\wwwroot REM ---- System parameters ---- SET F_PL=C:\perl\bin\perl.exe SET D_NZ=C:\namazu SET F_MK=%D_NZ%\bin\mknmz -s -U -O SET F_GC=%D_NZ%\bin\gcnmz -v SET D_IX=%D_NZ%\var\namazu\index SET F_LG=%D_NZ%\mknmz.log SET F_ER=%D_NZ%\mknmz_err.log REM ---- Remove lock file (for unexpected failier) IF EXIST %D_IX%\NMZ.lock2 DEL %D_IX%\NMZ.lock2 echo ---- Indexing ---- %F_PL% %F_MK% "%D_IX%" "%D_CT%" 1>"%F_LG%" 2>"%F_ER%" echo ---- Garbage collection ---- %F_PL% %F_GC% "%D_IX%" 1>>"%F_LG%" 2>>"%F_ER%"
REM ---- Connect remote directory ---- net use * /d /yes net use Z: /persistent:yes \\srv\share
| $ON_MEMORY_MAX | RAM |
| 5000000 (5MB) default value | 64MB |
| 50000000 (50MB) | 512MB |
| 100000000 (100MB) | 1GB |
#
# This is a Namazu configuration file for mknmz.
#
package conf; # Don't remove this line!
#===================================================================
#
# Administrator's email address
#
# $ADDRESS = 'webmaster@foo.bar.jp';
$ADDRESS = 'yourname@yourdomain.com';
#===================================================================
#
# Regular Expression Patterns
#
#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";
$HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}|htm|php|asp|jsp|xsp";
#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
# $ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
# "|.*\\.gz|.*\\.Z|.*\\.bz2" . # Compressed files
# "|.*\\.pdf|.*\\.ps" . # PDF, PostScript
# "|.*\\.tex|.*\\.dvi" . # TeX, DVI
# "|.*\\.rpm|.*\\.deb" . # RPM, DEB
# "|.*\\.doc|.*\\.xls|.*\\.ppt" . # Word, Excel, PowerPoint
# "|.*\\.j[sabf]w|.*\\.jtd" . # Ichitaro 4, 5, 6, 7, 8
# "|\\d+|[-\\w]+\\.[1-9n]"; # Mail/News, man
$ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
"|.*\\.pdf|.*\\.ps" . # PDF, PostScript
"|.*\\.tex|.*\\.dvi" . # TeX, DVI
"|.*\\.doc|.*\\.xls|.*\\.ppt" . # Word, Excel, PowerPoint
"|.*\\.j[sabf]w|.*\\.jtd"; # Ichitaro 4, 5, 6, 7, 8
#
# This pattern specifies file names which will NOT be targeted.
# NOTE: It can be specified by --deny=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
# $DENY_FILE = ".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
$DENY_FILE = ".*\\.(gif|png|jpg|jpeg|exe|zip|msi)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
#
# This pattern specifies PATHNAMEs which will NOT be targeted.
# NOTE: Usually specified by --exclude=regex option.
#
# $EXCLUDE_PATH = undef;
#
# This pattern specifies file names which can be omitted
# in URI. e.g., 'index.html|index.htm|Default.html'
#
# NOTE: This is similar to Apache's "DirectoryIndex" directive.
#
# $DIRECTORY_INDEX = "";
#
# This pattern specifies Mail/News's fields in its header which
# should be searchable. NOTE: case-insensitive
#
# $REMAIN_HEADER = "From|Date|Message-ID";
#
# This pattern specifies fields which used for field-specified
# searching. NOTE: case-insensitive
#
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";
#
# This pattern specifies meta tags which used for field-specified
# searching. NOTE: case-insensitive
#
# $META_TAGS = "keywords|description";
#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');
#
# This pattern specifies HTML elements which should be replaced with
# null string when removing them. Normally, the elements are replaced
# with a single space character.
#
# $NON_SEPARATION_ELEMENTS = 'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
# 'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';
#===================================================================
#
# Critical Numbers
#
#
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
# $ON_MEMORY_MAX = 5000000;
# 5MB for 64MB RAM
# $ON_MEMORY_MAX = 5000000;
# 25MB for 256MB RAM
# $ON_MEMORY_MAX = 25000000;
# 50MB for 512MB RAM
$ON_MEMORY_MAX = 50000000;
# 100MB for 1GB RAM
#$ON_MEMORY_MAX = 100000000;
#
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
#
# $FILE_SIZE_MAX = 2000000;
$FILE_SIZE_MAX = 20000000;
#
# The max text size for indexing. Files larger than this
# will be ignored.
#
# $TEXT_SIZE_MAX = 600000;
$TEXT_SIZE_MAX = 6000000;
#
# The max length of a word. the word longer than this will be ignored.
#
# $WORD_LENG_MAX = 128;
#
# Weights for HTML elements which are used for term weightning.
#
# %Weight =
# (
# 'html' => {
# 'title' => 16,
# 'h1' => 8,
# 'h2' => 7,
# 'h3' => 6,
# 'h4' => 5,
# 'h5' => 4,
# 'h6' => 3,
# 'a' => 4,
# 'strong' => 2,
# 'em' => 2,
# 'kbd' => 2,
# 'samp' => 2,
# 'var' => 2,
# 'code' => 2,
# 'cite' => 2,
# 'abbr' => 2,
# 'acronym'=> 2,
# 'dfn' => 2,
# },
# 'metakey' => 32, # for <meta name="keywords" content="foo bar">
# 'headers' => 8, # for Mail/News' headers
# );
#
# The max length of a HTML-tagged string which can be processed for
# term weighting.
# NOTE: There are not a few people has a bad manner using
# <h[1-6]> for changing a font size.
#
# $INVALID_LENG = 128;
#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
# $MAX_FIELD_LENGTH = 200;
$MAX_FIELD_LENGTH = 1024;
#===================================================================
#
# Softwares for handling a Japanese text
#
#
# Network Kanji Filter nkf v1.62 or later
#
# $NKF = "module_nkf";
#
# KAKASI
#
# $KAKASI = "module_kakasi -ieuc -oeuc -w";
#
# ChaSen 1.51 or later (simple wakatigaki)
#
# $CHASEN = "chasen -j -F '\%m '";
#
# ChaSen 1.51 or later (with noun words extraction)
#
# $CHASEN_NOUN = "chasen -j -F '\%m %H\\n'";
#
# Default Japanese processer: KAKASI or ChaSen.
#
# $WAKATI = $KAKASI;
#===================================================================
#
# Directories
#
$LIBDIR = 'C:/namazu/share/namazu/pl';
$FILTERDIR = 'C:/namazu/share/namazu/filter';
$TEMPLATEDIR = 'C:/namazu/share/namazu/template';
1;
# This is a Namazu configuration file for namazu or namazu.cgi.
#
# Originally, this file is named 'namazurc-sample'. so you should
# copy this to 'namazurc' to make the file effective.
#
# Each item is must be separated by one or more SPACE or TAB characters.
# You can use a double-quoted string for represanting a string which
# contains SPACE or TAB characters like "foo bar baz".
##
## Index: Specify the default directory.
##
Index C:\namazu\var\namazu\index
##
## Template: Set the template directory containing
## NMZ.{head,foot,body,tips,result} files.
##
#Template C:\namazu\share\namazu\template
##
## Replace: Replace TARGET with REPLACEMENT in URIs in search
## results.
##
## TARGET is specified by Ruby's perl-like regular expressions.
## You can caputure sub-strings in TARGET by surrounding them
## with `(' and `)'and use them later as backreferences by
## \1, \2, \3,... \9.
##
## To use meta characters literally such as `*', `+', `?', `|',
## `[', `]', `{', `}', `(', `)', escape them with `\'.
##
## e.g.,
##
## Replace /home/foo/public_html/ http://www.foobar.jp/~foo/
## Replace /home/(.*)/public_html/ http://www.foobar.jp/\1/
## Replace /C\|/foo/ http://www.foobar.jp/
##
## If you do not want to do the processing on command line use,
## run namazu with -U option.
##
## You can specify more than one Replace rules but the only
## first-matched rule are applied.
##
#Replace /home/foo/public_html/ http://www.foo.bar.jp/~foo/
Replace /C\|/mydocu~1/ http://hostname/doc/
##
## Logging: Set OFF to turn off keyword logging to NMZ.slog.
## Default is ON.
##
#Logging off
##
## Lang: Set the locale code such as `ja_JP.eucJP', `ja_JP.SJIS',
## `de', etc. This directive works only if the environment
## variable LANG is not set because the directive is mainly
## intended for CGI use. On the shell, You can set
## environemtnt variable LANG instead of using the directive.
##
## If you set `de' to it, namazu.cgi use
## NMZ.(head|foot|body|tips|results).de for displaying results
## and use a proper message catalog for `de'.
##
Lang ja_JP.SJIS
##
## Scoring: Set the scoring method "tfidf" or "simple".
##
#Scoring tfidf
##
## EmphasisTags: Set the pair of html elements which is used in
## keyword emphasizing for search results.
##
#EmphasisTags "<strong class=\"keyword\">" "</strong>"
##
## MaxHit: Set the maximum number of documents which can be
## handled in query operation. If documents matching a
## query exceed the value, they will be ignored.
##
#MaxHit 10000
##
## MaxMatch: Set the maximum number of words which can be
## handled in regex/prefix/inside/suffix query. If documents
## matching a query exceed the value, they will be ignored.
##
#MaxMatch 1000
##
## ContentType: Set "Content-Type" header output. If you want to
## use non-HTML template files, set it suitably.
#ContentType "text/x-hdml"