Solved

Creating Bounding box for 2D pdf file

  • 27 February 2023
  • 3 replies
  • 71 views

Badge +1

Hi,

This is regarding creating of bounding box on PDF file

Assume we can get bounding box details from another source.

And that is for whole string 18"-1S6(10S)-471454, using that need create bounding box for 471454.

 

To find the differences we manually created the two bounding box 

Below are the bounding box details also attached the  screenshot.

#smaller bounding box
"textRegion": {
"xMax": 0.12297021616798368,
"xMin": 0.10935503986439225,
"yMax": 0.6030244780685147,
"yMin": 0.5957594432410297
}

 Bigger bounding box we got from XML

#bigger bounding box
"textRegion": {
"xMax": 0.1245098493906997,
"xMin": 0.0782598493906997,
"yMax": 0.6047371474808384,
"yMin": 0.5915995895210239
}

 

So we can observe if we can reduce only xmin and we create  bounding box for 471454 and keep all other values same.

As we have to do this for lots of places in PDF, can we get any formula to “reduce Xmin based on bigger bounding box xmin” and use String length.???

 

icon

Best answer by Dilini Fernando 7 April 2023, 09:40

View original

3 replies

Userlevel 1

Hi!

I dont think there is a very good way to do that. You could try to count the number of characters and assume they have ~the same width. If there are then M characters to the left of your new x_min and N characters to the right you would get new_x_min = (N*old_x_min + M*old_x_max) / (N+M), but you can see that the boxes are a bit different in padding around the text, which will be an additional source for error. 

What is the reason that you want to reduce the box? If the smaller box is what you get from the detect algorithm and you want to compare that with the xml boxes, I can see the point to some extent. For clickable functionality, I would think that having the whole box as a reference would not be a problem. 

The full text in this case contains both an id/line number for a line, and some metadata for the line. I.e. its diameter is 18 inches I believe. Logically, the whole text is referencing the line, even though you only need the last part to identify it. 

Userlevel 4
Badge +2

Hi @shubha k v,

I hope above explanation was helpful. Could you please share some background information about this use case as requested by @Ola Liabøtrø?

Look forward to hearing from you.

Best regards,
Dilini   

Userlevel 4
Badge +2

Hi @shubha k v,

I hope Ola's explanation was useful. I'm closing this topic now, but please feel free to leave a comment if you require any further assistance.

Best regards,
Dilini

Reply