List of words in the document along with the number of occurrences: the 139 and 92 TET 87 to 63 of 59 for 58 PDF 53 text 51 in 50 a 46 is 38 be 36 as 34 are 34 PDFlib 33 or 32 with 26 can 23 Unicode 22 on 22 image 21 page 20 The 18 document 17 color 16 other 16 characters 16 products 16 images 15 documents 15 which 15 e 15 g 14 TETML 13 all 13 Text 12 font 12 character 12 used 12 Glyph 12 extracts 11 metadata 11 from 11 available 11 contents 11 information 11 extracted 11 such 11 output 11 not 10 may 10 common 9 word 9 various 9 GmbH 9 size 9 form 9 contains 8 many 8 correctly 8 use 8 glyph 7 formats 7 into 7 will 7 parts 7 pdflib 7 com 7 TIFF 7 XMP 7 more 7 it 7 processing 7 It 7 by 7 software 7 but 7 Extraction 6 PDFs 6 including 6 In 6 C 6 languages 6 programming 6 mapping 6 This 6 extraction 6 www 6 details 6 applications 6 only 6 XML 6 F1 6 width 6 often 6 s 6 single 6 channels 6 datasheet 5 Image 5 well 5 an 5 interface 5 etc 5 search 5 Acrobat 5 includes 5 proper 5 library 5 separate 5 following 5 that 5 connector 5 must 5 small 5 two 5 Other 5 extract 5 support 5 spot 5 position 4 pCOS 4 corresponding 4 words 4 instances 4 order 4 example 4 pages 4 JPEG 4 combined 4 since 4 product 4 where 4 tool 4 fonts 4 X 4 Windows 4 if 4 forms 4 process 4 detects 4 removes 4 bold 4 create 4 For 4 top 4 this 4 ligatures 4 initial 4 fragmented 4 command-line 4 offer 4 deployment 4 server 4 optionally 3 XML-based 3 format 3 content 3 columns 3 redundant 3 engine 3 based 3 addition 3 supports 3 versions 3 up 3 Ligatures 3 glyphs 3 appropriate 3 problems 3 packages 3 TeX 3 Word 3 Detection 3 several 3 patented 3 hyphenated 3 shadow 3 over 3 analyzed 3 multiple 3 each 3 included 3 Images 3 no 3 see 3 cannot 3 improve 3 Adobe 3 OS 3 additional 3 replace 3 one 3 emit 3 requirements 3 contain 3 While 3 most 3 bookmarks 3 file 3 processed 3 present 3 variety 3 also 3 even 3 make 3 XSLT 3 connectors 3 provide 3 code 3 environments 3 Search 3 Server 3 Microsoft 3 Cookbook 3 Challenges 3 different 3 shadowed 3 multiply 3 using 3 result 3 algorithm 3 placed 3 combinations 3 delivers 3 Drop 3 If 3 pieces 3 Arabic 3 damaged 3 spaces 3 CMYK 3 colors 3 DeviceN 3 any 3 fragments 3 thousands 3 development 3 batch 3 suitable 3 suited 3 Products 3 We 3 NET 3 licenses 3 Toolkit 2 What 2 makes 2 strings 2 plus 2 Raster 2 converts 2 called 2 resource 2 analysis 2 algorithms 2 boundaries 2 Using 2 integrated 2 retrieve 2 interactive 2 elements 2 their 2 headings 2 PDI 2 placing 2 relevant 2 flavors 2 All 2 ISO 2 require 2 Damaged 2 Since 2 encoded 2 sequence 2 identified 2 avoid 2 implements 2 workarounds 2 InDesign 2 systems 2 Analysis 2 required 2 Recombine 2 Table 2 determine 2 span 2 precise 2 Color 2 files 2 larger 2 facilitate 2 fidelity 2 guaranteed 2 conversion 2 quality 2 info 2 problematic 2 kinds 2 features 2 problem 2 names 2 plugin 2 remove 2 wide 2 variants 2 standard 2 four 2 meet 2 Web 2 than 2 domains 2 custom 2 fields 2 annotations 2 standards 2 individual 2 represents 2 tools 2 sure 2 creates 2 stylesheets 2 filters 2 Box 2 functionality 2 IFilter 2 tasks 2 samples 2 combine 2 lines 2 hyphen 2 treated 2 separately 2 they 2 still 2 programs 2 first 2 both 2 extracting 2 caps 2 paragraph 2 remainder 2 photographs 2 values 2 case 2 does 2 usable 2 unusable 2 garbage 2 Hebrew 2 logical 2 runs 2 left 2 left-to-right 2 so 2 heavily 2 displayed 2 subset 2 these 2 Separation 2 plain 2 consist 2 bottom 2 Both 2 offers 2 options 2 integration 2 workflows 2 core 2 free 2 IBM 2 i5 2 iSeries 2 iOS 2 performance 2 Application 2 VB 2 ASP 2 Java 2 Ruby 2 worldwide 2 our 2 world 2 Support 2 have 2 response 2 times 2 site 2 reliably 1 detailed 1 advanced 1 determining 1 grouping 1 removing 1 arbitrary 1 objects 1 With 1 Implement 1 indexer 1 Repurpose 1 Convert 1 Process 1 splitting 1 requires 1 Check 1 whether 1 particular 1 location 1 empty 1 barcode 1 stamp 1 Features 1 Accepted 1 Input 1 input 1 DC 1 Protected 1 do 1 password 1 opening 1 repaired 1 usually 1 normalizes 1 non- 1 aware 1 returned 1 UTF-8 1 UTF-16 1 native 1 Unicode-capable 1 multi-character 1 decomposed 1 Glyphs 1 without 1 mapped 1 configurable 1 replacement 1 misinterpretation 1 specific 1 creation 1 generated 1 mainframe 1 Content 1 Determine 1 dehyphenation 1 Remove 1 duplicate 1 artificially 1 bolded 1 paragraphs 1 reading 1 Correctly 1 scattered 1 Page 1 Layout 1 Tables 1 detected 1 cells 1 improves 1 ordering 1 rows 1 table 1 cell 1 Geometry 1 provides 1 metrics 1 widths 1 direction 1 Specific 1 areas 1 excluded 1 ignore 1 headers 1 footers 1 margins 1 analyzes 1 description 1 returns 1 identify 1 highlighted 1 JBIG2 1 Precise 1 geometric 1 angles 1 reported 1 Fragmented 1 repurposing 1 downsampling 1 occurs 1 ensures 1 highest 1 possible 1 querying 1 about 1 lists 1 Configuration 1 Options 1 special 1 handling 1 configuration 1 customized 1 via 1 user-supplied 1 tables 1 codes 1 FontReporter 1 auxiliary 1 analyzing 1 encodings 1 works 1 freely 1 Embedded 1 find 1 hints 1 External 1 system 1 results 1 embedded 1 Postprocessing 1 postprocessing 1 steps 1 Foldings 1 preserve 1 punctuation 1 irrelevant 1 scripts 1 Decompositions 1 equivalent 1 narrow 1 vertical 1 Japanese 1 Latin 1 superscript 1 respective 1 counterparts 1 converted 1 normalization 1 NFC 1 database 1 Document 1 Domains 1 places 1 deal 1 situations 1 predefined 1 entries 1 level 1 attachments 1 portfolios 1 recursively 1 comments 1 general 1 properties 1 queried 1 count 1 conformance 1 like 1 A 1 Metadata 1 ways 1 programmatically 1 Contents 1 flavor 1 easily 1 actual 1 colorspaces 1 analyze 1 JavaScript 1 space 1 ICC 1 profiles 1 intents 1 governed 1 schema 1 always 1 consistent 1 reliable 1 apply 1 certain 1 convert 1 Sample 1 distribution 1 fragment 1 shows 1 llx 1 lly 1 urx 1 ury 1 P 1 D 1 F 1 l 1 i 1 b 1 Connectors 1 necessary 1 glue 1 Lucene 1 Engine 1 Solr 1 TIKA 1 toolkit 1 Oracle 1 MediaWiki 1 retrieval 1 collection 1 examples 1 demonstrate 1 Several 1 show 1 how 1 enhance 1 add 1 links 1 Dehyphenation 1 combines 1 complete 1 important 1 searches 1 full 1 successful 1 although 1 Dashes 1 hyphens 1 removed 1 Shadow 1 artifical 1 Digital 1 effect 1 achieved 1 offset 1 between 1 Similarly 1 simulated 1 overprinting 1 same 1 As 1 once 1 detection 1 identifies 1 excess 1 copies 1 extra 1 hit 1 hits 1 would 1 found 1 duplicated 1 Accented 1 Characters 1 accents 1 diacritical 1 marks 1 close 1 Some 1 typesetting 1 notably 1 base 1 accent 1 letter 1 then 1 dieresis 1 situation 1 recombines 1 fi 1 fl 1 ffi 1 less 1 Th 1 sp 1 ct 1 st 1 others 1 When 1 digital 1 separated 1 constituent 1 allow 1 Caps 1 large 1 at 1 beginning 1 aligns 1 line 1 drops 1 down 1 emphasize 1 start 1 properly 1 keeps 1 dash 1 Inttrroduccttiion 1 Introduction 1 Midi-Pyr 1 en 1 ees 1 Midi-Pyrénées 1 rst 1 drop 1 cap 1 S 1 tellen 1 Stellen 1 Mapping 1 foundation 1 every 1 assigned 1 value 1 complicates 1 supporting 1 encoding 1 assign 1 worst 1 enough 1 cascaded 1 takes 1 deliver 1 while 1 Bidirectional 1 encode 1 simply 1 container 1 script 1 right 1 inserts 1 numbers 1 Western 1 interpreted 1 directions 1 hence 1 term 1 bidirectional 1 poses 1 challenges 1 contextual 1 These 1 shaped 1 normalized 1 isolated 1 Documents 1 get 1 because 1 transmission 1 errors 1 repair 1 mode 1 recovers 1 Sometimes 1 Even 1 extreme 1 cases 1 reorders 1 visual 1 mixture 1 right-to-left 1 Spaces 1 Compression 1 data 1 combination 1 eleven 1 nine 1 compression 1 balances 1 characteristics 1 capabilities 1 Regardless 1 internal 1 structure 1 pixel 1 Spot 1 Colors 1 Technically 1 known 1 single-channel 1 Device-dependent 1 CIE-based 1 Special 1 DeviceGray 1 CalGray 1 Indexed 1 DeviceRGB 1 CalRGB 1 Pattern 1 DeviceCMYK 1 Lab 1 ICCBased 1 processes 1 intended 1 need 1 superior 1 accept 1 Cyan 1 Magenta 1 missing 1 added 1 created 1 However 1 some 1 able 1 handle 1 restricted 1 instructed 1 channel 1 grayscale 1 Merging 1 broken 1 producing 1 appears 1 actually 1 Office 1 produce 1 hundreds 1 segments 1 varying 1 merges 1 Only 1 merging 1 reasonably 1 repurposed 1 Photoshop 1 displays 1 window 1 Double-clicking 1 icons 1 reveals 1 alternate 1 Although 1 segmented 1 smaller 1 reusable 1 Many 1 Ways 1 operations 1 similar 1 scenarios 1 component 1 desktop 1 Examples 1 package 1 doesn 1 t 1 integrate 1 complex 1 developers 1 who 1 familiar 1 range 1 integrating 1 databases 1 engines 1 Family 1 family 1 comprises 1 described 1 Share- 1 Point 1 SQL 1 Plugin 1 utility 1 evaluate 1 interactively 1 Supported 1 Development 1 Environments 1 everywhere 1 practically 1 computing 1 platforms 1 Linux 1 Unix 1 mainframes 1 mobile 1 Android 1 written 1 highly 1 optimized 1 maximum 1 overhead 1 Via 1 simple 1 API 1 Programming 1 Interface 1 accessible 1 COM 1 servlets 1 Objective-C 1 Perl 1 PHP 1 Python 1 REALbasic 1 Xojo 1 RPG 1 Rails 1 Benefits 1 Software 1 Rock-solid 1 Tens 1 programmers 1 working 1 meets 1 robust 1 unattended 1 Speed 1 Simplicity 1 incredibly 1 fast 1 per 1 second 1 straightforward 1 easy 1 learn 1 World 1 Our 1 international 1 They 1 customers 1 Professional 1 there 1 we 1 try 1 help 1 commercial 1 business-critical 1 By 1 adding 1 access 1 latest 1 should 1 arise 1 Licensing 1 licensing 1 source 1 contracts 1 extended 1 technical 1 short 1 updates 1 About 1 completely 1 focused 1 technology 1 Customers 1 company 1 closely 1 follows 1 market 1 trends 1 distributed 1 major 1 markets 1 North 1 America 1 Europe 1 Japan 1 Contact 1 Fully 1 functional 1 evaluation 1 documentation 1 please 1 contact 1 Franziska-Bilek-Weg 1 München 1 Germany 1 phone 1 fax 1 sales 1 Total unique words: 968