PDF Filter

Sep 24, 2008 at 4:07 PM
Edited Sep 24, 2008 at 4:17 PM
Hi, All
I have implemented this protocol handler for a database  (one column will point to a file, which I want to index ). I can also add some crawled property to index (other columns in database).

if the file is txt or doc file, it is fine. its content can be crawled.
however, for PDF file, I have encuntered the following problem:
Crawled (The filtering process could not process this item. This might be because you do not have the latest file filter for this type of item. Install the corresponding filter and retry your crawl. )

BTW, I have installed the PDF filter for MOSS and it can index file share content source (pdf files) without any problem
Have you encounter this kind of problems? how do you solve this?
any help would be greatlty appreciated
Thanks
Sep 25, 2008 at 12:56 PM
Hi,

Have you added the .pdf file type in the Shared Services' File Types list? That's at: Central Admin > SharedServices1 > Search Settings > File Types.

Clutching at straws a bit here, but does your URI scheme include the file extension? e.g. would you point to your file using [mossph://server/db/16234572.pdf] or [mossph://server/db/16234572]?

Mike
Sep 29, 2008 at 5:27 PM
Hi, All:

 

It turns out that we need to load ifilter in BindToFilter Method for pdf files

Thanks a lot 

 

Nov 10, 2008 at 8:19 AM
@bpwang

Care to share your code?