De siesta en siesta

hugo’s blog

String Manipulation example in IMPURE

Pre-processing a text file in Impure

Here is a simple way to pre-process a text file in Impure using the ReplaceSubstrings control.

I wanted to analyse the book “Alice in Wonderland” using Impure’s linguistic analysis tools. The text of the book is available form the Gutenberg project in .txt format at

Loading this file into Impure and visualising it is very easy with a FileLoader and a SimpleStringVisualizator:Impure screen shot

The SimpleStringVisualizator goes a little mad when the file to be visualized is too large, so for debugging purposes I descided to work with a small part of the text only, which I obtained using the getSubstring control:

Ok, now we can see the file. But if you look closely at the lines, you will notice that paragraphs are broken by line breaks. More specifically, in this file lines were separated by line breaks (actually using \r\n) as in the original printed book. Parargaphs were separated by double line breaks (using \r\n\r\n). Impure does a good job at converting all kinds of line breaks into impure line breaks, so we dont need to worry about the line break encodings. However, line breaks are problematic because one sentence can be broken into two or more lines, which may confuse the linguistic analyser.

What we need to do is to reconnect the sentences. This could be done simply by replacing line breaks by spaces using a replaceSubstring control. However, this would eliminate paragraphs. If we want to preserver paragraph information, we need to:

  1. replace paragraph breaks (double line breaks) into a temporary marker,</li>
  2. replace single line breaks by spaces (this puts sentences back together),</li>
  3. replace the temporary marker by a single line break, recovering the paragraphs.</li>

These three operations can be done using three replaceSubstring, or with a single replace Substrings, as done here:

We now have a string where each line break indicates a paragraph. It is now very easy to pick a few paragraphs for linguistic processing. To select a single paragraph, a splitString by linebreak and getElementFromList the desired paragraph:

Servicio Cliente de Vodafone: incompetentes y mentirosos.

Este es el resumen de mi aventura al intentar que Vodafone cambie en mi contrato mi número de pasaporte por el de mi numero de DNI:

Visita a tienda 1: tiene que llamar al servicio cliente.

Llamada servicio cliente 1: tiene que ir a la tienda.

Visita a la tienda 2:  tiene que llamar al servicion cliente. Pues llamen usetedes porque a mi no me hacen caso.

Llamada servicio cliente (desde la tienda) 2: llamamos y esta vez me hacen el cambio! Perfecto! Recibirá un SMS en 48 horas para confirmar.

(Pasan 48 horas)

Llamada servicio cliente 3: Se ha heco el cambio? No.

Llamada servicio cliente 4: Se ha hecho el cambio? No.

Llamada servicio cliente 5: Se ha hecho el cambio? “ pero usted envió el fax?” Que?! Si tiene que enviar un fax. Como?!

Visita a la tienda 3: Ellos no saben nada hay que llamar

Llamada servicio cliente (desde la tienda) 6: Llamamos y nos dan los datos que hay que enviar por fax y el numero.

Envio de Fax 1. Y por si acaso en la tienda deciden enviar una segunda vez:

Envio de Fax 2. En 4-7 días estará hecho el cambio!

(Pasan 7 días)

Llamada servicio cliente 7: Se ha heco el cambio? No, debe esperar 1-2 semanas.

(Pasan 5 días)

Llamada servicio cliente 8: Se ha heco el cambio? No, pero el sistema dice que está en curso Bien!

(Pasan 2 días)

Llamada servicio cliente 9: Se ha heco el cambio? No. Pero está en curso? No. Pero han recibido el fax? Parece que no. Tiene que enviar otro fax, el primero “no entró“. Pero si tengo el comprobante de que recibieron el fax. Un momento por favor… Si puede que se recibiera pero “ no entró en el sistema“. Pues metanlo en el sistema, no? No tiene que enviar un fax. Mire señorita llevo 2 semanas con esto, no se cuantas llamadas y visitas, así que busquen mi fax y metanlo en el sistema. Tiene que enviar otro fax. Pues no lo pienso enviar. Tiene alguna consulta más? Antes de que cuelgue quisiéramos informarle de nuestras nuevas ofertas blah…

GOOD BYE VODAFONE! Wellcome Yoigo!

Que pasa en Irán, y el nuevo mundo de las Social Media

Quiero saber que está pasando en Irán. Busco en internet y lo que encuentro no son titulares de BBC o El Pais, sino blogs, tweets y emails. Al leerlos mi pelo se pone de punta. Y creo que eso es algo bueno. Juzgar vosotros mismos:

Es un poco como desayunar con estudiantes de Teherán que llegan de las protestas de anoche…

CommonTag is out // Salió CommonTag

Bibtex Crossrefs in Lyx

Bibtex collapses similar crossrefs into an additional reference.

To stop this in LaTeX  you need to specify the -min-crossrefs parameter when you run bibtex.

To stop this in Lyx you need to go set Tools/Preferences/Output/LaTeX/BibTeX command: bibtex -min-crossrefs=900

Correlator, our new demo.



Correlator is now live at the Yahoo! Sandbox.

Correlator showcases our ability to locate and classify entities in text, and to sort them by relevance with respect to a query.  You can learn more about it in different places:

Writing a SIGIR (sig-alternate) paper on Lyx

Lyx is wonderful… when it works :)

Here is what I had to do to write my SIGIR paper on Lyx 1.6.1 (I recommend updating to this before continuing…).

First, get the class file that you need (in my case for SIGIR was and put it somewhere your latex can find it. (For windows see below, for unix you can
instructions here

For Windows using MikeTex this means putting the file somewhere under C:\Program Files\MiKTeX 2.7\tex\latex and running MikeTex/Settings/Refresh FNDB and MikeTex/Settings/Update Formats.

Now get Lyx to use this class. For this you can follow instructions in the Lyx Customization manual or many web pages on the topic (e.g., here).

Great, almost there. You can start your paper and all works fine… except citations. I got the following error when adding citations:

Use of  \@citex doesn’t match its definition

… then you must always put `1′ after `\a’, since control sequence names are

made up of letters only. The macro here has not been

followed by the required stuff, so I’m ignoring it

Not so useful :( The problem seems to be that citation in sig-alternate clashes with the babel package. You need to turn babel off in your Lyx. You can do this as follows:

  1. In Tools/Preferences/LanguageSettings  you need to untick “Use babel” and “Global”.
  2. You need to clear the “Command start” field (this should not be necessary, probably a Lyx bug?). (You can find the command start later by searching for “babel” in the User Guide.)

This is enough to make it work on a new document. However, if you are half way thru writing your document, some of the language options stick to it, I got the message:

Package babel Error: You haven’t loaded the option english yet.

\select@language{english} You may proceed …

To fix this you need to create a new document and cut and paste the contents of your old document into this new one. This worked for me!

Closer to the brickwall to knoweldge

I wrote a simple page that produces a picture of a thing:

It uses DBPedia to find a the infobox of a wikipedia entity and query for its picture.

It works very nicely in some examples. Try: wine, peach, “Karl Marx”, woman, dog, fear, swimming…

It also fails on many others. Try: green, war,

And the picture is not what you expect sometimes. Try notebook, apple…

In my opinion having services such as these is very important, not because they bring us closer to AI or knowledge, but because they bring us closer towards the many brick walls separating us from them!


Calcular lo que valen los %TAE que nos prometen…

Intentaba averiguar la diferencia entre diferentes ofertas que prometen TAEs elevados pero solo unos meses… Resultó tan dificil que acabé escribiendo una pagina sobre el tema :)

Aqui esta: Calculadora de Beneficios TAE

XEmacs for Windows

Installing Packages

To install packages in Window’s XEmacs I had to:

1) Tools-Packages-SetDownloadSite-OfficialReleases-US (Main XEMacs site). (I tried with some of the official european sites but the packages I needed were missing)

2) Tools-Packages-ListAndInstall

3) Go to the package you need (with the cursor! mouse gets confused for this) and right-click mouse on “Toggle Install” and on “Add Required” and finally on “Install/Remove Selected”.

Say yes to whatever questions is asks about creating new directories