All posts by thePanz

Accessing SOLR server instance with SSH Tunnelling

I recently discovered the power of SSH tunneling: a Swiss Army knife for your remote services control when the only access available for your server is SSH.

If your remote server only provides SSH access and you need to access (debug, test, check..) the available services on that machine SSH Tunneling is your rescue. Keeping things simple: SSH let you open a listening socket on your machine, while redirecting all its traffic to a specific port on your SSH-accessed host.

In the following example I will describe this scenario: on my remote server (IP: x.x.x.x) a Solr instance has been configured as a search service, it has been locked to accept only local connections coming from 127.0.0.1, and I need to check its configuration by the Solr admin panel (usually reachable on port 8983). The remote server is only accessible by SSH (port 20) and HTTPS (port 443).

With SSH Tunneling I was able to access the Solr admin panel from my browser: all the data is piped through the web and to the server by SSH.

The command I used is quite simple:

ssh -L localhost:8080:127.0.0.1:8983 user@x.x.x.x -N -C

Where:

  • 8080 is the local port where the tunneling starts from (on machine)
  • 127.0.0.1:8983 is the destination IP:port for my request; in other words where the tunnel points to ( as seen as if the connection starts once logged into the remote server)
  • user@x.x.x.x the remote SSH credentials
  • -N don’t send any command through SSH, simply wait on the shell
  • -C enable SSH protocol compression (we care about speed, don’t we?)

Once the connection has been established and the SSH tunnel build, we can visit “http://localhost:8080” to actually access the remote Solr admin interface.

For a more detailed description of SSH Tunneling use cases, please check SSH manual

Easychair data extraction

Have you ever heard of EasyChair? It is a free, simple and efficient way for managing a (scientific) conference: provides you all most of the tools for handling paper submitting, approving and camera-ready submitting (for further details please refer to EasyChair website).

The first issue is that some advanced features (as the complete data access as XML) is only available as a paid service. What if the data you need is already available but only as an (hugly) HTML file? I needed the whole list of accepted papers and the only option was an HTML page, formatted by DIVs and not, as some accessibility rules suggests, as a Table. First solution: copy-n-paste from HTML to a spreadsheet. More advanced: provide a script for converting such file to a “well written” HTML. In the generated file the list of papers are in a HTML table, no stylesheets are applied and all the links to authors webpages are removed.

Here we go: a simple set of sed rules to convert the list of accepted papers to a table based page.
Put all of these in a .sed file and invoke the sed commad as:

#sed -f file.sed < accepted-papers.html > accepted-papers-converted.html

The file.sed contents:

s/<br\/>/ /g
s/<style>.*<\/style>//g
s/<\/h1>/<\/h1><table>/g
s/<\/body>/<\/table><\/body>/g
s/<b>Abstract: <\/b>//g
s/<\/div><div class="paper">/<\/tr><tr><td>/g
s/<div class="paper">/<tr><td>/g
s/<span class="authors"><span>//g
s/<\/span>\. <\/span>/<\/td>/g
s/<span class="authors">//g
s/\. <\/span>/<\/td>/g
s/<span class="title">/<td>/g
s/<\/div><div class="abstract">/<\/td><td>/g
s/<a href="[^"]*">\([^<]*\)<\/a>/\1/gg

Firefox Sessionstore.js fixer

Sometimes Firefox opens with all of my tabs (and groups) empty: no session restore is provided and seems that all of my groups (more than 10 groups, with a total of 200 tabs) is simply disappeared.. it happend not only to me, as some googling shows.

What TH? Simply Firefox messed something up in your sessionstore.js file. Blogs suggest you to simply remove it and replace the backup automatically created by Firefox named sessionstore.old
But.. what if even this way gives you an empty set of tabs? Something is really wrong given the 3.8Mb sessionstore.js file!

The solution is only one: open the sessionstore.js file, fix its JSON contents and save it. But the problem here is: how can I edit a 3.8MByte JSON file without messing everything up? Continue reading Firefox Sessionstore.js fixer

Drupal7 Image formatter

I’ve recently started using Drupal7 for a personal website project. I finally had the possibility to test on the field the new Drupal7 APIs and modules.

It’s amazing to see how much Views module is evolved into something that I can’t do without 🙂 Great to see CCK (now Fields) in core and the increasing number of Themes supporting HTML5 (AdaptiveTheme, Omega.. etc).

The major difficulty is to locate, inside the Admin section, where the new configurations have been moved.. after a while I was able to fix Iconizer missing icons and finally, recognizing the same icons, starting to setup Drupal7 as fast as I do in Drupal6!

Among great enhancements, I found fantastic the added ability to configure a field formatter easly in the content-type settings form. What I feel missing is, for the Image field type, to limit the number of images displayed by the formatter. Since Image field is in core it’s a little bit difficult to get a patch approved… so I developed a *new* Image formatter that extends the previous one adding the “Limit images” feature.

Continue reading Drupal7 Image formatter

Lucene with PlingStemmer

I’ve been recently working with Java Lucene and its Analyzers and for I project I worked on the client needed to use the Porter Stemmer algorithm. I used the SnowballAnalyzer, but unfortunately I found out that, as someone before me said, Porter stemmer works right on 90% of the cases, but when it fails, it fails hard! The example is the following: consider the words “organic”, “organ” and “organization” .. the three words haven’t a lot in common except of their prefix, thei do not mean the same  tihng… but for Porter (and for the Snowball Analyzer) they’re stemmed into “organ”.. in Lucene 3.1.x release there will be plenty new features allowing programmers to control and fine tune each stemming algorithm.

So, what can I do since I must use the 3.0.3 release? Well.. I created a new PlingStemmerFilter using YAGO java Pling stemmer implementation following instructions found here.

Continue reading Lucene with PlingStemmer