University of Cambridge > Talks.cam > arg58's list > A modular architecture for Unicode text compression

A modular architecture for Unicode text compression

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adam Gleave.

Unicode is now ubiquitous, with 87% of online content in the UTF -8 character encoding. Conventional compression techniques operate on individual bytes: this works well for ASCII , but poorly for UTF -8, where a character can span multiple bytes. Previous attempts at Unicode compression have invented new algorithms from scratch, with generally poor results. My approach is to extend existing data compression algorithms to operate over Unicode characters. I find this substantially improves compression effectiveness for Unicode text, with only a small overhead for ASCII and binary files.

Please note the talk will last for 15 minutes, although I will be available afterwards for any further questions.

This talk is part of the arg58's list series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity