<html>
<head>
<title>Namazu 2.0.12 for Win on IIS 5.0</title>
<LINK REL="stylesheet" TYPE="text/css" HREF="../css0.css">
</head>
<body>

<!--#include virtual="/doc/header.html" -->

<p align=right>Last updated 11 Feb 2004</a>
<h1>Namazu 2.0.12 for Win on IIS 5.0</h1>

<p>Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files</p>

<p>You can start a document management system in your enterprise. It is powerful tool to index shared documents, as MS Word, MS Excel, PDF file etc, and search from a web page to share information amongst working group. Individual files are isolated and not reusable unless there is an index to reach those files. Think about the Internet, the Internet is useless if there is no search engine.</p>

<p>This Namazu may eliminate database from your organization, because you do not need to enter data into database manually. The only thing is to keep all documents in one location, and index it regularly (daily).</p>

<p>(The Japanese word `Namazu' means `catfish' in English.) </p>

<a href="http://www.namazu.org/windows/index.html.en">Home Page</a>

<h2>Download</h2>

<table border='1' cellspacing='0' cellpadding='0'>
<tr><th><a href="http://www.namazu.org/windows/index.html.en">
        Namazu 2.0.12 for Win32</a>
        <td><a href="http://www.namazu.org/win32/nmz2012.exe">nmz2012.exe</a>
<tr><th><a href="http://aspn.activestate.com/ASPN/Downloads/ActivePerl/">
        Active Perl</a>
        <td>ActivePerl-5.6.1.635-MSWin32-x86.msi (download build 600 series, not 800 series)
</table>

<h2>Installation</h2>
<ol>
<li>Make sure IIS is installed already. If not, Active Perl can not install script into IIS.
<li>Install Active Perl (ActivePerl-5.8.0.805-MSWin32-x86.msi). Make sure you install into C: drive. If not, the namazu installer can not install a module into perl directory. You mignt need to restart if it shows a restart dialog box.
<li>execute <b>nmz2012.exe.</b><br>
<li>Click left "??" button. It is supporsed be "OK". Don't change the directory and drive, you may face problems later. Keep it as it is.<br>
<img src="namazu_inst01.gif"><br>

<li>When you see command prompt, the installer asks you to install perl modules. Answer "yes" twice<br>

<li>In the end of the installation, you'll see unknown message as below, but don't mind it.<br>
<img src="namazu_inst02.gif"><br>

<li>Restart your computer
</ol>

<h2>For PDF file indexing</h2>
<ol>
<li>Download  <a href="ftp://ftp.foolabs.com/pub/xpdf/">xpdf-3.00-win32.zip</a>

<li>Download <a href="http://www.csa.ru/ftp/arch/win/">gzip124xN.zip</a>


<li>Unzip gzip124xN.zip and copy gzip.exe into C:\namazu\bin
<li>Unzip xpdf-3.00-win32.zip and copy pdftotext.exe into C:\namazu\bin 

</ol>

<h2>Make an Index</h2>
<ol>
<li>Make a batch file to make an index (like makeindex.bat)<br>
<li>Make an index for <b>C:\Inetpub\wwwroot\</b> folder

<pre>
echo off

REM ---- Contents directory ----
SET D_CT=C:\Inetpub\wwwroot

REM ---- System parameters ----
SET F_PL=C:\perl\bin\perl.exe
SET D_NZ=C:\namazu
SET F_MK=%D_NZ%\bin\mknmz -s -U -O
SET F_GC=%D_NZ%\bin\gcnmz -v
SET D_IX=%D_NZ%\var\namazu\index
SET F_LG=%D_NZ%\mknmz.log
SET F_ER=%D_NZ%\mknmz_err.log

REM ---- Remove lock file (for unexpected failier)
IF EXIST %D_IX%\NMZ.lock2 DEL %D_IX%\NMZ.lock2

echo ---- Indexing ----
%F_PL% %F_MK% "%D_IX%" "%D_CT%" 1>"%F_LG%" 2>"%F_ER%"

echo ---- Garbage collection ----
%F_PL% %F_GC% "%D_IX%" 1>>"%F_LG%" 2>>"%F_ER%"
</pre>

<li>Note that previous version of Namazu could not use long file path, but this version seems working fine. You can index folder as C:\Documents and Settings\your.name\My Documents

<li>Run the batch file to make an index or make a schedule to index automatically.

<li>If you want to index network shared folder, and connect automatically by batch command, use following as per your requirement
<pre>
REM ---- Connect remote directory ----
net use * /d /yes
net use Z: /persistent:yes \\srv\share
</pre>

</ol>

<h2>Configuration</h2>

<li>For Indexing, you need to modify C:\namazu\etc\namazu\mknmzrc. You need to adjust parameters as per your requirement. To increase the speed of indexing, you can adjust ON_MEMORY_MAX parameter as below,</li><br>
<table border='1' cellspacing='0' cellpadding='0'>
<tr><td>$ON_MEMORY_MAX<td>RAM
<tr><td>5000000 (5MB) default value<td>64MB
<tr><td>50000000 (50MB)<td>512MB
<tr><td>100000000 (100MB)<td>1GB
</table>

<blockquote><pre>
#
# This is a Namazu configuration file for mknmz.
#
package conf;  # Don't remove this line!

#===================================================================
#
# Administrator's email address
#
# $ADDRESS = 'webmaster@foo.bar.jp';
<b>$ADDRESS = 'yourname@yourdomain.com';</b>

#===================================================================
#
# Regular Expression Patterns
#

#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";

<b>$HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}|htm|php|asp|jsp|xsp";</b>

#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
#
# $ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
#               "|.*\\.gz|.*\\.Z|.*\\.bz2" .       # Compressed files
#               "|.*\\.pdf|.*\\.ps" .              # PDF, PostScript
#               "|.*\\.tex|.*\\.dvi" .             # TeX, DVI
#               "|.*\\.rpm|.*\\.deb" .             # RPM, DEB
#               "|.*\\.doc|.*\\.xls|.*\\.ppt" .    # Word, Excel, PowerPoint
#               "|.*\\.j[sabf]w|.*\\.jtd" .        # Ichitaro 4, 5, 6, 7, 8
#               "|\\d+|[-\\w]+\\.[1-9n]";          # Mail/News, man
<b>
$ALLOW_FILE =   ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
                "|.*\\.pdf|.*\\.ps" .              # PDF, PostScript
                "|.*\\.tex|.*\\.dvi" .             # TeX, DVI
                "|.*\\.doc|.*\\.xls|.*\\.ppt" .    # Word, Excel, PowerPoint
                "|.*\\.j[sabf]w|.*\\.jtd";         # Ichitaro 4, 5, 6, 7, 8
</b>
#
# This pattern specifies file names which will NOT be targeted.
# NOTE: It can be specified by --deny=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
#
# $DENY_FILE = ".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";<b>
$DENY_FILE = ".*\\.(gif|png|jpg|jpeg|exe|zip|msi)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";</b>

#
# This pattern specifies PATHNAMEs which will NOT be targeted.
# NOTE: Usually specified by --exclude=regex option.
#
# $EXCLUDE_PATH = undef;

#
# This pattern specifies file names which can be omitted 
# in URI.  e.g., 'index.html|index.htm|Default.html'
#
# NOTE: This is similar to Apache's "DirectoryIndex" directive.
#
# $DIRECTORY_INDEX = "";

#
# This pattern specifies Mail/News's fields in its header which 
# should be searchable.  NOTE: case-insensitive
#
# $REMAIN_HEADER = "From|Date|Message-ID";

#
# This pattern specifies fields which used for field-specified 
# searching.  NOTE: case-insensitive
# 
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";

#
# This pattern specifies meta tags which used for field-specified 
# searching.  NOTE: case-insensitive
#
# $META_TAGS = "keywords|description";

#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' =&gt; 'subject', 'author' =&gt; 'from');

#
# This pattern specifies HTML elements which should be replaced with 
# null string when removing them. Normally, the elements are replaced 
# with a single space character.
#
# $NON_SEPARATION_ELEMENTS = 'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
#                        'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';

#===================================================================
# 
# Critical Numbers
# 

# 
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
# $ON_MEMORY_MAX   = 5000000;
<b>
# 5MB for 64MB RAM
# $ON_MEMORY_MAX = 5000000;

# 25MB for 256MB RAM
# $ON_MEMORY_MAX = 25000000;

# 50MB for 512MB RAM
$ON_MEMORY_MAX = 50000000;

# 100MB for 1GB RAM
#$ON_MEMORY_MAX = 100000000;
</b>
#
# The max file size for indexing. Files larger than this 
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because 
#       binary-formated files such as PDF, Word are larger.
#
# $FILE_SIZE_MAX   =    2000000;
<b>$FILE_SIZE_MAX   =  20000000;</b>

#
# The max text size for indexing. Files larger than this 
# will be ignored.
#
# $TEXT_SIZE_MAX   =     600000;
<b>$TEXT_SIZE_MAX   =   6000000;</b>

#
# The max length of a word. the word longer than this will be ignored.
#
# $WORD_LENG_MAX   = 128;

#
# Weights for HTML elements which are used for term weightning.
#
# %Weight = 
#     (
#      'html' =&gt; {
#          'title'  =&gt; 16,
#          'h1'     =&gt; 8,
#          'h2'     =&gt; 7,
#          'h3'     =&gt; 6,
#          'h4'     =&gt; 5,
#          'h5'     =&gt; 4,
#          'h6'     =&gt; 3,
#          'a'      =&gt; 4,
#          'strong' =&gt; 2,
#          'em'     =&gt; 2,
#          'kbd'    =&gt; 2,
#          'samp'   =&gt; 2,
#          'var'    =&gt; 2,
#          'code'   =&gt; 2,
#          'cite'   =&gt; 2,
#          'abbr'   =&gt; 2,
#          'acronym'=&gt; 2,
#          'dfn'    =&gt; 2,
#      },
#      'metakey' =&gt; 32, # for &lt;meta name="keywords" content="foo bar"&gt;
#      'headers' =&gt; 8,  # for Mail/News' headers
# );

#
# The max length of a HTML-tagged string which can be processed for
# term weighting. 
# NOTE: There are not a few people has a bad manner using 
#       &lt;h[1-6]&gt; for changing a font size.
#
# $INVALID_LENG = 128; 

#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
# $MAX_FIELD_LENGTH = 200;

<b>$MAX_FIELD_LENGTH = 1024;</b>


#===================================================================
#
# Softwares for handling a Japanese text
#

#
# Network Kanji Filter nkf v1.62 or later
#
# $NKF = "module_nkf"; 

#
# KAKASI
#
# $KAKASI = "module_kakasi -ieuc -oeuc -w";

#
# ChaSen 1.51 or later (simple wakatigaki)
#
# $CHASEN = "chasen -j -F '\%m '";

#
# ChaSen 1.51 or later (with noun words extraction)
#
# $CHASEN_NOUN = "chasen -j -F '\%m %H\\n'";

#
# Default Japanese processer: KAKASI or ChaSen.
#
# $WAKATI  = $KAKASI;


#===================================================================
#
# Directories
#
$LIBDIR = 'C:/namazu/share/namazu/pl';
$FILTERDIR = 'C:/namazu/share/namazu/filter';
$TEMPLATEDIR = 'C:/namazu/share/namazu/template';

1;
</pre></blockquote>

<li>For Searching, you need to modify C:\namazu\etc\namazu\namazurc. Specially you need to edit "Replace" parameter to match the local path to the path on the web server.
<blockquote><pre>
# This is a Namazu configuration file for namazu or namazu.cgi.
#
#  Originally, this file is named 'namazurc-sample'.  so you should
#  copy this to 'namazurc' to make the file effective.
#  
#  Each item is must be separated by one or more SPACE or TAB characters. 
#  You can use a double-quoted string for represanting a string which 
#  contains SPACE or TAB characters like "foo bar baz".


##
## Index: Specify the default directory.
## 
Index         C:\namazu\var\namazu\index


##
## Template: Set the template directory containing
## NMZ.{head,foot,body,tips,result} files.
##
#Template      C:\namazu\share\namazu\template


##
## Replace: Replace TARGET with REPLACEMENT in URIs in search
## results.  
##
## TARGET is specified by Ruby's perl-like regular expressions.  
## You can caputure sub-strings in TARGET by surrounding them 
## with `(' and `)'and use them later as backreferences by
## \1, \2, \3,... \9.
##  
## To use meta characters literally such as `*', `+', `?', `|', 
## `[', `]', `{', `}', `(', `)', escape them with `\'.
##  
## e.g.,
##  
##    Replace  /home/foo/public_html/   http://www.foobar.jp/~foo/
##    Replace  /home/(.*)/public_html/  http://www.foobar.jp/\1/
##    Replace   /C\|/foo/               http://www.foobar.jp/
##  
## If you do not want to do the processing on command line use, 
## run namazu with -U option.
##
## You can specify more than one Replace rules but the only 
## first-matched rule are applied. 
##
#Replace       /home/foo/public_html/  http://www.foo.bar.jp/~foo/
<b>Replace   /C\|/mydocu~1/               http://hostname/doc/</b>


##
## Logging: Set OFF to turn off keyword logging to NMZ.slog. 
## Default is ON.
##
#Logging       off


##
## Lang: Set the locale code such as `ja_JP.eucJP', `ja_JP.SJIS', 
## `de', etc.  This directive works only if the environment 
## variable LANG is not set because the directive is mainly 
## intended for CGI use.  On the shell, You can set 
## environemtnt variable LANG instead of using the directive.
## 
## If you set `de' to it, namazu.cgi use 
## NMZ.(head|foot|body|tips|results).de for displaying results 
## and use a proper message catalog for `de'.
##
Lang          ja_JP.SJIS


##
## Scoring: Set the scoring method "tfidf" or "simple".
##
#Scoring       tfidf


##
## EmphasisTags: Set the pair of html elements which is used in
## keyword emphasizing for search results.
##
#EmphasisTags  "&lt;strong class=\"keyword\"&gt;"   "&lt;/strong&gt;"

##
## MaxHit: Set the maximum number of documents which can be
## handled in query operation.  If documents matching a
## query exceed the value, they will be ignored.
##
#MaxHit 10000

##
## MaxMatch: Set the maximum number of words which can be
## handled in regex/prefix/inside/suffix query. If documents
## matching a query exceed the value, they will be ignored.
##
#MaxMatch       1000

##
## ContentType: Set "Content-Type" header output. If you want to
## use non-HTML template files, set it suitably.
#ContentType    "text/x-hdml"
</pre></blockquote>

<h2>How to search from Browser</h2>
<ol>
<!--
<li>Copy <b>C:\namazu\libexec\namazu.cgi.exe</b> into <b>C:\Inetpub\Scripts\namazu\</b>

<li>If your C drive is NTFS file format, your c:\namazu folder needs a <b>[Read] permission</b> for [Everyone] group's.<br>
<img src="namazu01.gif">
-->

<li>Open Browse and enter <b>http://localhost/scripts/namazu/namazu.cgi</b><br>
Notice: If your OS is XP, it might be http://localhost/<b>bin</b>/namazu/namazu.cgi<br>
<img src="namazu02.gif">

<li>You need to configure IIS to link your contents to the web. You may change the home directory or create a virtual directory.

</ol>


<h2>Problems</h2>
You will see following problems during indexing especially if you are indexing learge number of files.
<ol>
<li>While indexing by mknmz, you may see this error but every time different directory. "Can't cd to (...)TAP4: No such file or directory...at C:\Perl/lib/File/Find.pm line535.". Specially when you are indexing a network drive. You may exclude this folder to avoid the problem.</li>
<li>MS Word crashes during making index.</li>
<li>"pdftotext.exe has encountered a problem and needs to close."</li>
</ol>
When you see those problems and if mknmz can not continue indexing,
then kill MS Word process and try to continue. 
If it still can not, you need to restart the computer and index from the begining again.
The solution could be to exclude some folders and start from small number of folders.

You always must monitor the progress whether it is working or freezing.

<!--
<li>If you can not make PDF file index during <b>mknmz</b>, as below
<blockquote><pre>
pdftotext version 1.00
Copyright 1996-2002 Derek B. Noonburg
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta information  -enc <string>     : output text encoding name
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information
1/2 - /C|/mydocu~1/doc/hoge.pdf Unable to convert pdf file (maybe cop
ying protection)
</pre></blockquote>

<li>Edit C:\namazu\share\namazu\filter\pdf.pl line 74 and remove <b>-eucjp</b> option
<blockquote><pre>
# system("$pdfconvpath -q <b>-eucjp</b> -raw $tmpfile $tmpfile2");
system("$pdfconvpath -q -raw $tmpfile $tmpfile2");
</pre></blockquote>
-->


<hr><a href="../index.html">Back</a>
- <a href="../../support.html">Support</a>

<!--#include virtual="/doc/footer.html" -->

</body>
</html>
