hrcak mascot   Srce   HID

Izvorni znanstveni članak
https://doi.org/10.2498/cit.2002.03.02

Using Inverted Files to Compress Text

Strahil Ristov

Puni tekst: engleski, pdf (114 KB) str. 157-161 preuzimanja: 339* citiraj
APA 6th Edition
Ristov, S. (2002). Using Inverted Files to Compress Text. Journal of computing and information technology, 10 (3), 157-161. https://doi.org/10.2498/cit.2002.03.02
MLA 8th Edition
Ristov, Strahil. "Using Inverted Files to Compress Text." Journal of computing and information technology, vol. 10, br. 3, 2002, str. 157-161. https://doi.org/10.2498/cit.2002.03.02. Citirano 24.07.2019.
Chicago 17th Edition
Ristov, Strahil. "Using Inverted Files to Compress Text." Journal of computing and information technology 10, br. 3 (2002): 157-161. https://doi.org/10.2498/cit.2002.03.02
Harvard
Ristov, S. (2002). 'Using Inverted Files to Compress Text', Journal of computing and information technology, 10(3), str. 157-161. https://doi.org/10.2498/cit.2002.03.02
Vancouver
Ristov S. Using Inverted Files to Compress Text. Journal of computing and information technology [Internet]. 2002 [pristupljeno 24.07.2019.];10(3):157-161. https://doi.org/10.2498/cit.2002.03.02
IEEE
S. Ristov, "Using Inverted Files to Compress Text", Journal of computing and information technology, vol.10, br. 3, str. 157-161, 2002. [Online]. https://doi.org/10.2498/cit.2002.03.02

Sažetak
This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed by original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time.

Hrčak ID: 44774

URI
https://hrcak.srce.hr/44774

Posjeta: 439 *