Wednesday, February 6, 2013

Nepomuk WebMiner 0.5

Since my last post a lot has happened to the Nepomuk-WebMiner (former MetaDataExtractor).

The WebMiner went through the KDE Review process and got cleaned up a bit during this process. The new location of it is extragear/base/nepomuk-webminer.

On the code side, I have fixed several bugs and integrated the automatic fetching better into the current Nepomuk system.

The new WebMiner-Service respects the suspend/resume and event monitoring (no internet, low diskspace, on battery mode) in the same was the FileIndexer does it.

When the automatic fetching is started via the command-line or dolphin command, the service is used for the actual fetching. This allows to show the current fetching progress in the nepomukcontroller (in the systray).

Starting with KDE 4.11 the Systemsettings for Nepomuk and the Nepomuk WebMiner are combined and won't show up as two different entries anymore.








Instead of the buggy imdb python script that has a hard time following the changes on the imdb website to allow proper fetching of movie resources, a new plugin for themoviedb.org was created.

The next step for the WebMiner will be the full integration into KDE SC for the 4.11 release.
So moving out of extragear again into some other proper place.

In order to make this happen there is still one large blocker task that needs to be done.

So if anyone is good with python and has some time, the script at nepomuk-core/services/storage/rcgen/nepomuk-simpleresource-rcgen.py needs to be improved.
This script is responsible to generate the SimpleResource classes from the used ontology.
As it takes nearly all ontologies into account and is rather slow right now, the call takes ~20 minutes for each generation. This is a pain for anyone compiling the WebMiner from source.

Any help is very welcome.