Feeds

Feeds overview

API consumption is provided by an extension of the Drupal feeds module.

The feeds module is used to create content or entities within a system based on feed data on the internet. Source data is mapped and massaged to destination properties on entities in this system. 

The Drupal 8 version of the feeds module is currently in a development release, so it has many bugs. These bugs are remediated by custom code and the overall feeds functionality is thoroughly extended to meet the requirements of this application.

Symfony Dom Crawler extension

The data in the feeds consumed by this application is inconsistent. Example inconsistencies:

  • Some feeds provide full article bodies and some have teaser descriptions.
  • Most feeds don't have any images directly in the feed.
  • Some feeds provide author data as comma separate links and some provide author data as plain text.

These inconsistencies are addressed by custom code that extends the Drupal feeds module and by a custom Dom Crawler.

In Drupal 7, there were a few Dom crawling solutions that extended feeds like Feeds Crawler. But because Drupal 8 is still in its early phases, there is currently no public solution available for Dom Crawling.

So a new module is currently under development on this application that could possibly be offered as a public contributed module on drupal.org in the future. It is called "Talking Machines Feeds Crawler" (tm_feeds_crawler) and is built as an extension of:

  1. Symfony DomCrawler
  2. Symfony Browser Kit
  3. Guzzle HTTP Client
  4. Goutte Web Scraper
  5. Other areas of Drupal and the open source world

Symfony Dom Crawler is used to parse data that is publicly available in the feeds and to perform custom logic per feed.

Example DOM crawling sequence:

  1. Click link that is associated with feed item.
  2. Crawl in to article body found on the given page.
  3. Extract images found in the article body that match a certain pattern in their URL.
  4. Map images to proper image field on the article content type.

Instead of writing custom code to craft these DOM crawling sequences, a UI is provided to build these conditions and actions.

Custom code is the enemy of scale. As a global practice and whenever possible, these differences in configuration are brought in to the UI so the same code can be re-purposed to solve future challenges.