PHP Classes

PHP HTML to Text Conversion: Parse HTML and extract text contained in it

Recommend this page to a friend!
  Info   View files Documentation   View files View files (70)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2022-10-21 (8 days ago) RSS 2.0 feedNot enough user ratingsTotal: 392 This week: 4All time: 6,589 This week: 221Up
Version License PHP version Categories
html2text 1.0.18GNU General Publi...5HTML, PHP 5, Text processing
Collaborate with this project Author

html2text - github.com

Description

This class can parse HTML and extract text contained in it.

It can take a given HTML string and parse it to extract the text in the HTML document.

The class can change the case of the text inside certain HTML elements, as well prepend or append a given text.

Innovation Award
PHP Programming Innovation award nominee
December 2016
Number 9
Most PHP applications are used to generate HTML but some times we need to also generate text versions of given HTML, like for instance to send by email that includes the HTML and the text version as alternative.

This package provides a solution that lets you automatically create the text version of a given text that you can use on email messages or for other purposes.

Manuel Lemos
Picture of Lars Moelleken
  Performance   Level  
Innovation award
Innovation award
Nominee: 11x

Winner: 1x

 

Details

Build Status Coverage Status Codacy Badge Latest Stable Version Total Downloads License Donate to this project using Paypal Donate to this project using Patreon

:memo: Html2Text

Description

Convert HTML to formatted plain text, e.g. for text mails.

Installation

The recommended installation way is through Composer.

$ composer require voku/html2text

Basic Usage

$html = new \voku\Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"

Extended Usage

Each element (h1, li, div, etc) can have the following options:

  • 'case' => convert case (```Html2Text::OPTION_NONE, Html2Text::OPTION_UPPERCASE, Html2Text::OPTION_LOWERCASE , Html2Text::OPTION_UCFIRST, Html2Text::OPTION_TITLE```)
  • 'prepend' => prepend a string
  • 'append' => append a string

For example:

$html = '<h1>Should have "AAA" changed to BBB</h1><ul><li>• Custom bullet should be removed</li></ul><img alt="The Linux Tux" src="tux.png" />';
$expected = 'SHOULD HAVE "BBB" CHANGED TO BBB' . "\n\n" . '- Custom bullet should be removed |' . "\n\n" . '[IMAGE]: "The Linux Tux"';

$html2text = new Html2Text(
    $html,
    array(
        'width'    => 0,
        'elements' => array(
            'h1' => array(
              'case' => Html2Text::OPTION_UPPERCASE, 
              'replace' => array('AAA', 'BBB')),
            'li' => array(
              'case' => Html2Text::OPTION_NONE, 
              'replace' => array('•', ''), 
              'prepend' => "- ",
              'append' => " |",
            ),
        ),
    )
);

$html2text->setPrefixForImages('[IMAGE]: ');
$html2text->setPrefixForLinks('[LINKS]: ');
$html2text->getText(); // === $expected

Live Demo

  • HTML | TEXT
  • https://moelleken.org/url_to_text.php?url=https://ADD_YOUR_URL_HERE

History

This library started life on the blog of Jon Abernathy http://www.chuggnutt.com/html2text

A number of projects picked up the library and started using it - among those was RoundCube mail. They made a number of updates to it over time to suit their webmail client.

Now this is a extend fork of the original Html2Text.

Support

For support and donations please visit Github | Issues | PayPal | Patreon.

For status updates and release announcements please visit Releases | Twitter | Patreon.

For professional support please contact me.

Thanks

  • Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Managment, etc.
  • Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
  • Thanks to Travis CI for being the most awesome, easiest continous integration tool out there!
  • Thanks to StyleCI for the simple but powerfull code style check.
  • Thanks to PHPStan && Psalm for relly great Static analysis tools and for discover bugs in the code!
  Files folder image Files  
File Role Description
Files folder imagesrc (1 file)
Files folder imagetests (26 files, 1 directory)
Accessible without login Plain text file .editorconfig Data Auxiliary data
Accessible without login Plain text file .scrutinizer.yml Data Auxiliary data
Accessible without login Plain text file .styleci.yml Data Auxiliary data
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file CHANGELOG.md Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.md Lic. License text
Accessible without login Plain text file phpcs.php_cs Example Example script
Accessible without login Plain text file phpstan.neon Data Auxiliary data
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

 Version Control Unique User Downloads Download Rankings  
 100%
Total:392
This week:4
All time:6,589
This week:221Up

For more information send a message to info at phpclasses dot org.