Skilark — Methodology

Skilark provides insights into the technology job market by systematically collecting and analyzing job postings from leading technology companies. Our approach combines automated data collection, AI-powered classification, and human-curated taxonomies to surface patterns in skills demand, hiring trends, and role requirements.

Data Collection

We gather job listings from public career pages and job board APIs maintained by technology companies. Our crawlers visit these sources on a regular schedule, respecting rate limits and robots.txt directives to ensure responsible data collection. Each request identifies our system with appropriate user agent information, and we implement delays between requests to avoid overloading company infrastructure. We focus on a curated set of companies across different sectors of the technology industry—from cloud infrastructure and AI research to enterprise software and developer tools.

Deduplication and Freshness

Job postings are deduplicated based on their source and external identifier, ensuring we track the same position over time rather than creating duplicate entries. When we encounter a listing we've seen before, we update its last-seen timestamp, allowing us to detect when positions close or are removed from career sites. This temporal tracking enables us to report on both active listings and historical trends in hiring activity.

AI-Powered Enrichment

Raw job descriptions are processed through AI models to extract structured information including required skills, job categories, seniority levels, and remote work policies. We use large language models to parse unstructured text and map it against our skill taxonomy. This automated classification allows us to aggregate data across thousands of listings that use different terminology for the same concepts. The AI enrichment process standardizes variations like "machine learning" and "ML" into consistent skill tags.

Skill Taxonomy

Our skill taxonomy is a curated list of technologies, methodologies, and competencies commonly required in software engineering roles. Skills are manually selected based on prevalence in the job market and granularity useful for analysis. Job listings are mapped to this taxonomy during the enrichment process—skills mentioned in descriptions are linked only if they exist in our predefined list. This approach balances comprehensive coverage with manageable categorization, though it means emerging technologies may not appear until we expand the taxonomy.

Data Limitations

Skilark's data represents a subset of the overall technology job market. We focus on companies with publicly accessible job postings and machine-readable APIs, which excludes some organizations and roles posted only on third-party job boards. Our AI classification, while sophisticated, can misinterpret job descriptions, particularly for highly specialized roles or those using non-standard language. The data reflects point-in-time snapshots of active listings rather than complete hiring funnel metrics like application volumes or time-to-hire.

Our dataset emphasizes companies with substantial engineering organizations and public career sites, which skews toward larger technology companies and well-funded startups. Geographic distribution may not represent the full market, as we currently prioritize listings for US-based positions. Salary information is not systematically available from job postings, limiting our ability to analyze compensation trends.

Update Frequency

We run automated crawlers on regular schedules, with update frequencies varying by source. High-volume companies are checked more frequently to capture new postings quickly, while smaller sources are polled less often. The site reflects data current as of the most recent crawl cycle, typically within the past 24-48 hours. Historical data allows us to track trends over weeks and months, though our dataset currently covers a limited time window as the project is relatively new.

Privacy and Transparency

We collect only publicly available information from company career pages. No applicant data, internal company information, or personally identifiable information is gathered or stored. Our goal is to provide job seekers and industry observers with aggregated insights into hiring patterns, not to surveil individual companies or candidates. We're committed to responsible data practices and welcome feedback on our methodology.