De siesta en siesta

hugo’s blog

Installing MG4J search engine on Cygwin and Linux

Spent some time installing MG4J on both Windows and Linux. Here are my findings (thanks Vigna and Boldi for all the help):

Eclipse – Cygwin Installation for Windows:

Prerequisites:

  • optional: install rlwrap for cygwin

Installation

  • Create an Eclipse project from “existing Ant BuildFile” and choose the build.xml file in the sources-folder
  • Add to Project’s build-path all jars in the dependencies-folder
  • Eclipse project should build now…

Indexing and Querying from Cygwin’s bash
You can now index and query through Eclipse using the Run interface.

However, if you would like to do this from the command line, you need to work a little more.

I am using Cygwin’s bash. The biggest problem I had was setting the path. You need to set if from the cygwin shell (bash in my case) but in WINDOWS format :( I wrote a little perl script to do this:

#!/usr/bin/perl -w
# Hugo Zaragoza, 2008.
# BUILDS Windows java classpath line for cygwin bash
die ("ARGUMENTS: files") unless @ARGV==1;
$line="";
foreach $d (@ARGV){
$d =~ s/\/cygdrive\/c/c:/;
$d =~ s/\//\\/g;
$line .=";$d";
}
print $line;

Then you can set the path in bash with (substituting the paths of your dependency and source folders):
export CLASSPATH=`perl ./winCLASSPATH.pl /cygdrive/c/dependency-folder/*jar /cygdrive/c/source-folder/build`

Now you can build your indices and query the engine… for example (remember to use Windows paths):

# BUILD INDEX
java -Xmx512M it.unimi.dsi.mg4j.document.TRECDocumentCollection trec.collection c:/HUGO/DATA/trecFiles*
java -Xmx512M it.unimi.dsi.mg4j.tool.IndexBuilder -Itoken -S trec.collection trec
...

and you are ready to query the enginre:

# START ENGINE
rlwrap -H ~/.Query-history java -Xmx512M it.unimi.dsi.mg4j.query.Query -i GenericItem trec-token -h -c trec.collection -v

now you can use command line interface or check http://localhost:4242/Query

Unix Installation

No: you are NOT stupid. It took me 4 days of emails with my system administrator and Vigna and Boldi (the authors of MG4J!)

Prerequisites
First, root needed to install these in our system:

  • You need a recent Ant with some extras:
    • install ant version >= 1.7.0 (otherwise it does NOT work, believe me… :(
    • install xml-commons-api
    • reinstall java (in my case xml-commons-api removes it!)
    • install ant-nodeps (in .rpm) or ant-ant-optional (in .deb)
  • (optional) install “rlwrap” to have a nicer command line access to mg4j

Installation:

  • download & untar dependencies (http://mg4j.dsi.unimi.it/mg4j-deps.tar.gz in my case) into a <dependency-directory>
  • download & untar source (http://mg4j.dsi.unimi.it/mg4j-2.1.1-src.tar.gz in my case) into a <source-directory>
  • Go to the top directory in your <source-directory>, you should see a build.properties file.
  • modify build.properties second line: jar.base=<dependency-directory>
  • set your classpath to point to dependency-directory (ant does not need it but javacc does (here is a nice trick to do this: export CLASSPATH=$(ls -1 <dependency-dir>/*jar | paste -s -d:)
  • “ant jar” in source-directory, where the build.xml file is.
  • I get the following warnings, but it builds!!!
    • [taskdef] Could not load definitions from resource emma_ant.properties. It could not be found.
      [taskdef] Could not load definitions from resource checkstyletask.properties. It could not be found.
      [taskdef] build.xml:165: Warning: taskdef class edu.umd.cs.findbugs.anttask.FindBugsTask cannot be found

Indexing and Querying:

Ok you are almost there :)

With the above you should be able to build and index and run queries on the command line (as described in the windows section above), if you set you classpath correctly :

export CLASSPATH=$(ls -1 <dependency-dir>/*jar <source-dir>/build | paste -s -d:)
java -Xmx512M it.unimi.dsi.mg4j.document.TRECDocumentCollection trec.collection trecFiles*
java -Xmx512M it.unimi.dsi.mg4j.tool.IndexBuilder -Itoken -S trec.collection trec
rlwrap -H ~/.Query-history java -Xmx512M it.unimi.dsi.mg4j.query.Query -i GenericItem trec-token -h -c trec.collection -v

However, if you try to access the search engine through http it will probably break with an error about query.velocity not being found. For this the fix I found is to indicate the relative path to the query.velocity file as -Dit.unimi.dsi.mg4j.query.QueryServlet.template= (relative to a directory in the classpath:)

export CLASSPATH=$(ls -1 <dependency-dir>/*jar <source-dir>/build | paste -s -d:)

velocity=’../it/unimi/dsi/mg4j/query/query.velocity’
rlwrap -H ~/.Query-history java -Xmx512M -Dit.unimi.dsi.mg4j.query.QueryServlet.template=$velocity it.unimi.dsi.mg4j.query.Query -i GenericItem trec-token -h -c trec.collection -v

Installing PerlDL for Windows Cygwin

Wanted to play with PerlDL as an alternative to numpy…

I thought it would be easier to install under cygwin with CPAN but I kept getting the error:

— NOT OK
Running make test
Can’t test without successful make
Running make install
make had returned bad status, install seems impossible

After looking around and wasting a lot of time I found out that:

  1. I needed to install gcc on cygwin using its setup.exe
  2. perl’s IO:Tty would not install

[TO BE CONTINUED]

Installing Numpy Scipy Numerical for Python on Windows XP

(only intended for complete beginners on this topic… like me.)

I wanted to try using python for some data analysis that I need to do.

I spent a few hours trying to figure out what I needed and how to install it… extremely confusing between scipy and numpy and Numerica and…

Anyway, once you know what to do it is quite simple :

I could not figure out how to install Numpy on Cygwin… can you help?

;)