SourceForge.net Logo
 JobClient - Get off your bike!


News 14 April 2004: It's been a while...

I just spent a year back at uni getting a masters degree. It took me off the job market, and it used up all my coding juice for the duration.

Now, when I finally get back to look at the project, the pages I was using as spider fodder have changed their layout and the code no longer works. It's fixable.

I also want to make some design improvements. Specifically, I want to de-couple the storage from the browser and the sorter, and I want to use MySQL as the primary store, so the bot drops jobs into the database and the browser reads from the DB.

I also want to make it easier to write new bot modules so users can plug in spiders for the sites that suit them.

 
Tedious Legal Stuff
This package is copyright Nick Fortune , 2002 and is distributed under the Perl Artistic Licence, a copy of which is included in this distribution. Basically, you can't sell this, but you're welcome to give it away.

What It Does
Jobclient is a screen scraping application that finds jobs matching your skillset from a number of web sites, stores the details in a MySQL database and gives you a GUI to sort, browse and apply for those jobs.

At the Current time the software is broken due to operational changes at the target sites. I expect to have something useful up in a week or so.

What you need.
Perl 5.6 or 5.8 to start with. JobClient is currently being developed under RedHat Linux 7.2. I don't expect many problems on other unices.

Windows is a little more problematic. The project started life on Win95 with ActiveState perl and I think most of the traps here have already been dealt with, but it has been a while since I tried to use it on that platfrom. Best of Luck to you.

Downloads


Have a look on the project page over on sourceforge.
Installation

  1. Unpack the archive into $HOME/jobclient or somesuch. 
  2. Find somewhere suitable for the contents of the lib directory and/or set your PERL5LIB environment variable thus:     

    #
    # Assuming ksh, bash or similar...
    #
    export PERL5LIB=$PERL5LIB:$HOME/jobclient/lib


  3. Edit jobclient.cfg.
              
    • Set base_dir to the directory where you unpacked jobcliet. 
    • Set "pro" to a list of keywords you want to see in your jobs
    • Set "con" to a list of keywords you do not want to see
    • Set subj_kill to a a list of words any of which will kill a job before you even see it if it appears in the job title

  4. Edit templ.txt

    • Change the "cc: " line to your address
    • Change the "attach: " line to the to document you want attached when you email applications.  (Your CV works well in this role...)

  5. Set JOBCLIENT_CFG in your environment to point at $HOME/jobclient/jobclient.cfg
  6. Connect to the internet (if you have to)
  7. Run fetch.pl to get the day's data. Wait for it to finish
  8. Disconnect from the internet (if you want to)
  9. Run jobclient.pl to browse the data you fetched. Kill will cause a job to vanish from the broswe list. Apply will send queue email to be sent to the advertiser.
  10. When you've done, reconnect and run unspool.pl to actually send the email.
Jobs that have been sent or killed have their reference code stored (if supplied) so that you don't see them if they appear more than once.

The other script in the suite is save_csvs.pl. This copies .csv files into the csvs directory and munges the name to incorporate a timestamp.  You may want to periodically clear this directory if you use this feature. It can get big quite fast.

Modules

The scripts make use of a number of modules, and I've pretty much lost track of which ones come with the standard distribution. This is what you'll need:
  • Storable
  • MIME::Lite
  • Net::SMTP
  • HTML::TreeBuilder
  • HTTP::Request
  • HTTP::Response
  • LWP::Simple
  • LWP::UserAgent
  • Tk      
  • Tk::LabFrame
  • Tk::ROText
Look on CPAN or run perl -MCPAN -e shell if there are any you don't have.

The Template
The template file is defined in the configuration file. Whatever is in this file gets used as the basis for any email application you send.  The message gets customised in two ways:

Firstly, each header line and the main message as a whole gets expanded with eval. This means that the interpreter will attempt to expand any variables in the template. (Exception - '@' chars in the headers get escaped since they do tend to crop up in email addresses.)

Specifically, a hash named %values is intended for use in expansion.  The fields it currently understands are:

 





to  The email address for the job
first  
The first name of the agent/employer
full   
The full name of the agent/employer
ref    
The agent/employer's reference code
source 
A description of where the job was found, currently either Jobserve or CW360









The second level of customisation is the email window where you have a chance to modify the expanded message prior to transmission

Security

Because of the way the expansion mechanism is coded (and I'm open to suggestions here) there is a security risk if a hostile party can edit your template file. Since the eval "" construct will expand any stringifiable expression, any strings of the form
${ any_old_code }
will result in any_old_code being executed. So write protect your template and don't run this as root!

Odds and Ends

There are a couple of weird looking bits in the code. One if the half assed sendmail reimplementation in MailWin.pm, the other is the pseudo pipe IPC for the monitor. Both of them have their origins in the windows implementation (pipes don't work properly, fork is emulated, and sendmail didn't exist).

The first cut of this package contained a lot of hardcode. To get this ready for SourceForge I've made some hasty last minute hacks, and managed to get a lot of them. This gives us two problems: hasty last minute hacks and unexpected hardcodes. Feel free to advise me if I've missed anything major. I'll have a more polished version ready in a week or two.

To Do

  • Fix the fetch mechanism to work with the new jobserve and CW360 layouts.
  • Clean up the fetch.pl. It works, but it's messy, slow and has poor error handling. Additionally, I want to
    • make which sites are queried configurable
    • organise some sort of plugin format for screenscraping modules
    • intergrate with the jobclient.pl script

  • Allow runtime data restriction based on patterns applied to one or more fields
  • Use DBI to allow a generic database integratoion for anything with a DBD module. This is almost done.
  • Set up a proper install environment
  • Re-test under windows and see what needs fixing. It worked the last time I tried it, but that was many mods ago...
And Finally...

Thank you for your interest. It's been a blast, but I have to go now.

Nick Fortune,  Sep 13, 2002.