Thursday, June 23, 2011

reading data from the .Doc file by using Apache POI api

This program simply explains how to read data from the MS wordfile(.DOC) line by line using Apache POI,
what is Apache POI and what is the need i already explain in previous post, you can find that post here
for executing this program we need to download Apache POI api and make jar files  in classpath.

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class NewDocReader {
 public static void main(String args[]) throws FileNotFoundException, IOException{
   File docFile=new File(“c:\\multi\\multi.doc”);   // file object was created
// file input stream with docFile
   FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); 
// throws IOException and need to import org.apache.poi.hwpf.HWPFDocument;
   HWPFDocument doc=new HWPFDocument(finStream);
// import  org.apache.poi.hwpf.extractor.WordExtractor
   WordExtractor wordExtract=new WordExtractor(doc); 
   String [] dataArray =wordExtract.getParagraphText();
   // dataArray stores the each line from the document
   for(int i=0;i<dataArray.length;i++)
      // printing lines from the array
   finStream.close(); //closing fileinputstream

No comments:

Post a Comment