Showing posts with label WebMiner. Show all posts
Showing posts with label WebMiner. Show all posts

Monday, May 13, 2013

Nepomuk WebMiner 0.6

A few month have past, this my last WebMiner update. In the meantime I finished my Master Thesis, moved to a new location and started my new job. Perfect time to release a new version with the changes I have made since.

The Nepomuk WebMiner 0.6 adds beside several bugfixes:

  • User changeable regular expression for the filename parsing.
  • Removed its own and reuse the Nepomuk internal fileindexing to get id3 tags and other file metadata.
  • Add whitelist for automatic web search. You might like to lookup the folder with your publication pdfs but not your private documents. Or the network share with your tvshows, but not your private family videos. This works on top of the Nepomuk whitelist. So you Nepomuk can index these files, but not all of them will be websearched.
  • Instead of the dull treeview that shows the raw fetched metadata, you can now see and edit the metadata in several fancy edit fields.
 

 

 


You can find the latest release on projects.kde.org or the tarball on kde-apps.org.
Even though I wanted to get this into KDE SC 4.11, I doubt this is going to happen. Soft feature freeze is around the corner and I don't feel comfortable enough to let this be part of SC and annoy all users with this service yet. There are still a lot of usability problems I like to have solved properly before this can be part of of any KDE installation.

So please test the latest release and report any errors back to me.

Wednesday, February 6, 2013

Nepomuk WebMiner 0.5

Since my last post a lot has happened to the Nepomuk-WebMiner (former MetaDataExtractor).

The WebMiner went through the KDE Review process and got cleaned up a bit during this process. The new location of it is extragear/base/nepomuk-webminer.

On the code side, I have fixed several bugs and integrated the automatic fetching better into the current Nepomuk system.

The new WebMiner-Service respects the suspend/resume and event monitoring (no internet, low diskspace, on battery mode) in the same was the FileIndexer does it.

When the automatic fetching is started via the command-line or dolphin command, the service is used for the actual fetching. This allows to show the current fetching progress in the nepomukcontroller (in the systray).

Starting with KDE 4.11 the Systemsettings for Nepomuk and the Nepomuk WebMiner are combined and won't show up as two different entries anymore.








Instead of the buggy imdb python script that has a hard time following the changes on the imdb website to allow proper fetching of movie resources, a new plugin for themoviedb.org was created.

The next step for the WebMiner will be the full integration into KDE SC for the 4.11 release.
So moving out of extragear again into some other proper place.

In order to make this happen there is still one large blocker task that needs to be done.

So if anyone is good with python and has some time, the script at nepomuk-core/services/storage/rcgen/nepomuk-simpleresource-rcgen.py needs to be improved.
This script is responsible to generate the SimpleResource classes from the used ontology.
As it takes nearly all ontologies into account and is rather slow right now, the call takes ~20 minutes for each generation. This is a pain for anyone compiling the WebMiner from source.

Any help is very welcome.