更新时间:2022-06-22 22:45:33
您可以使用page.GetIterator()
循环浏览页面中找到的项目.对于单个项目,您可以得到一个边界框",它是一个Tesseract.Rect
(矩形结构),其中包含:X1
,Y1
,X2
,Y2
坐标.
You can loop through found items in the page using page.GetIterator()
. For the individual items you can get a 'bounding box', this is a Tesseract.Rect
(rectangle struct) which contains: X1
, Y1
, X2
, Y2
coordinates.
Tesseract.PageIteratorLevel myLevel = /*TODO*/;
using (var page = Engine.Process(img))
using (var iter = page.GetIterator())
{
iter.Begin();
do
{
if (iter.TryGetBoundingBox(myLevel, out var rect))
{
var curText = iter.GetText(myLevel);
// Your code here, 'rect' should containt the location of the text, 'curText' contains the actual text itself
}
} while (iter.Next(myLevel));
}
没有明确的方法可以使用输入中的位置来分隔输出中的文本.您将必须为此编写一些自定义逻辑.
There is no clear-cut way to use the positions in the input to space the text in the output. You're going to have to write some custom logic for that.
您可以使用以下类似的代码来估算文本左侧所需的空格数:
You might be able to estimate the number of spaces you need to the left of your text with something like this:
var padLeftSpaces = (int)Math.Round((rect.X1 / inputWidth) * outputWidthSpaces);