且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用ICEpdf在PDF页面的特定区域中提取文本

更新时间:2022-12-03 13:43:43

在表示页面的Page对象上,可以调用该方法:

ON the Page object that represents a page you can call the method:

PageText pageText = document.getPageText(pagNumber);

类似于捆绑包示例./examples/extraction/PageTextExtraction.java

Similar to the bundle example ./examples/extraction/PageTextExtraction.java

PageText对象包含页面的所有LineText-> WordText-> GlyphText对象. LineText,WordText和GlyphText都扩展了AbstractText,它具有getBounds()方法.这些对象的边界位于PDF用户空间(第一个几何象限)中. Java2D在第四几何象限中.假设您已经具有selectionRectangle,则代码如下:

The PageText object contains all the LineText->WordText->GlyphText objects for the page. LineText, WordText and GlyphText all extend AbstractText which has a getBounds() method. The bounds of these objects are in PDF user space, the 1st geometric quadrant. Java2D is in the 4th geometric quadrant. Assuming you already have the selectionRectangle the code would be as follows:


//  the currently selected state, ignore highlighted.
currentPage.getViewText().clearSelected();

// get page transform, same for all calculations
AffineTransform pageTransform = currentPage.getPageTransform(
        Page.BOUNDARY_CROPBOX,
        documentViewModel.getViewRotation(),
        documentViewModel.getViewZoom());

Rectangle2D.Float pageSpaceSelectRectangle =
        convertRectangleToPageSpace(selectionRectangle, pageTransform);
ArrayList pageLines = pageText.getPageLines();
for (LineText pageLine : pageLines) {
    // check for containment, if so break into words.
    if (pageLine.getBounds().intersects(pageSpaceSelectRectangle )) {
        // you have some selected text. 
    }
}



    /**
     * Converts the rectangle to the space specified by the page tranform. This
     * is a utility method for converting a selection rectangle to page space
     * so that an intersection can be calculated to determine a selected state.
     *
     * @param mouseRect     rectangle to convert space of
     * @param pageTransform page transform
     * @return converted rectangle.
     */
    private Rectangle2D convertRectangleToPageSpace(Rectangle mouseRect,
                                                    AffineTransform pageTransform) {
        GeneralPath shapePath;
        try {
            AffineTransform tranform = pageTransform.createInverse();
            shapePath = new GeneralPath(mouseRect);
            shapePath.transform(tranform);
            return shapePath.getBounds2D();
        } catch (NoninvertibleTransformException e) {
            logger.log(Level.SEVERE,
                    "Error converting mouse point to page space.", e);
        }
        return null;
    }