Leptonica  1.77.0
Image processing and image analysis suite
ptafunc2.c File Reference
#include "allheaders.h"

Go to the source code of this file.

Functions

PTAptaSort (PTA *ptas, l_int32 sorttype, l_int32 sortorder, NUMA **pnaindex)
 
l_ok ptaGetSortIndex (PTA *ptas, l_int32 sorttype, l_int32 sortorder, NUMA **pnaindex)
 
PTAptaSortByIndex (PTA *ptas, NUMA *naindex)
 
PTAAptaaSortByIndex (PTAA *ptaas, NUMA *naindex)
 
l_ok ptaGetRankValue (PTA *pta, l_float32 fract, PTA *ptasort, l_int32 sorttype, l_float32 *pval)
 
PTAptaUnionByAset (PTA *pta1, PTA *pta2)
 
PTAptaRemoveDupsByAset (PTA *ptas)
 
PTAptaIntersectionByAset (PTA *pta1, PTA *pta2)
 
L_ASETl_asetCreateFromPta (PTA *pta)
 
PTAptaUnionByHash (PTA *pta1, PTA *pta2)
 
l_ok ptaRemoveDupsByHash (PTA *ptas, PTA **pptad, L_DNAHASH **pdahash)
 
PTAptaIntersectionByHash (PTA *pta1, PTA *pta2)
 
l_ok ptaFindPtByHash (PTA *pta, L_DNAHASH *dahash, l_int32 x, l_int32 y, l_int32 *pindex)
 
L_DNAHASHl_dnaHashCreateFromPta (PTA *pta)
 

Detailed Description


This file has these Pta utilities:
  • sorting
  • ordered set operations

    - hash map operations

     Sorting
          PTA        *ptaSort()
          l_int32     ptaGetSortIndex()
          PTA        *ptaSortByIndex()
          PTAA       *ptaaSortByIndex()
          l_int32     ptaGetRankValue()
     Set operations using aset (rbtree)
          PTA        *ptaUnionByAset()
          PTA        *ptaRemoveDupsByAset()
          PTA        *ptaIntersectionByAset()
          L_ASET     *l_asetCreateFromPta()
     Set operations using hashing (dnahash)
          PTA        *ptaUnionByHash()
          l_int32     ptaRemoveDupsByHash()
          PTA        *ptaIntersectionByHash();
          l_int32     ptaFindPtByHash()
          L_DNAHASH  *l_dnaHashCreateFromPta()
We have two implementations of set operations on an array of points:
  (1) Using an underlying tree (rbtree)
      This uses a good 64 bit hashing function for the key,
      that is not expected to have hash collisions (and we do
      not test for them).  The tree is built up of the hash
      values, and if the hash is found in the tree, it is
      assumed that the point has already been found.
  (2) Using an underlying hashing of the keys (dnahash)
      This uses a fast 64 bit hashing function for the key,
      which is then hashed into a bucket (a dna in a dnaHash).
      Because hash collisions can occur, the index into the
      pta for the point that gave rise to that key is stored,
      and the dna (bucket) is traversed, using the stored indices
      to determine if that point had already been seen.
 

Definition in file ptafunc2.c.

Function Documentation

◆ l_asetCreateFromPta()

L_ASET* l_asetCreateFromPta ( PTA pta)

l_asetCreateFromPta()

Parameters
[in]pta
Returns
set using a 64-bit hash of (x,y) as the key

Definition at line 458 of file ptafunc2.c.

Referenced by ptaIntersectionByAset().

◆ l_dnaHashCreateFromPta()

L_DNAHASH* l_dnaHashCreateFromPta ( PTA pta)

l_dnaHashCreateFromPta()

Parameters
[in]pta
Returns
dahash, or NULL on error

Definition at line 727 of file ptafunc2.c.

References findNextLargerPrime(), l_dnaHashAdd(), l_dnaHashCreate(), l_hashPtToUint64(), ptaGetCount(), and ptaGetIPt().

Referenced by ptaIntersectionByHash().

◆ ptaaSortByIndex()

PTAA* ptaaSortByIndex ( PTAA ptaas,
NUMA naindex 
)

ptaaSortByIndex()

Parameters
[in]ptaas
[in]naindexna that maps from the new ptaa to the input ptaa
Returns
ptaad sorted, or NULL on error

Definition at line 225 of file ptafunc2.c.

References L_COPY, L_INSERT, numaGetCount(), numaGetIValue(), ptaaAddPta(), ptaaCreate(), ptaaGetCount(), and ptaaGetPta().

◆ ptaFindPtByHash()

l_ok ptaFindPtByHash ( PTA pta,
L_DNAHASH dahash,
l_int32  x,
l_int32  y,
l_int32 *  pindex 
)

ptaFindPtByHash()

Parameters
[in]pta
[in]dahashbuilt from pta
[in]x,yarbitrary points
[out]pindexindex into pta if (x,y) is in pta; -1 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) Fast lookup in dnaHash associated with a pta, to see if a
         random point (x,y) is already stored in the hash table.
     (2) We use a strong hash function to minimize the chance that
         two different points hash to the same key value.
     (3) We select the number of buckets to be about 5% of the size
         of the input pta, so that when fully populated, each
         bucket (dna) will have about 20 entries, each being an index
         into pta.  In lookup, after hashing to the key, and then
         again to the bucket, we traverse the bucket (dna), using the
         index into pta to check if the point (x,y) has been found before.

Definition at line 681 of file ptafunc2.c.

References l_dnaGetCount(), l_dnaGetIValue(), l_dnaHashGetDna(), l_hashPtToUint64(), L_NOCOPY, and ptaGetIPt().

Referenced by ptaIntersectionByHash(), and ptaRemoveDupsByHash().

◆ ptaGetRankValue()

l_ok ptaGetRankValue ( PTA pta,
l_float32  fract,
PTA ptasort,
l_int32  sorttype,
l_float32 *  pval 
)

ptaGetRankValue()

Parameters
[in]pta
[in]fractuse 0.0 for smallest, 1.0 for largest
[in]ptasort[optional] version of pta sorted by sorttype
[in]sorttypeL_SORT_BY_X, L_SORT_BY_Y
[out]pval&rankval: the x or y value at fract
Returns
0 if OK, 1 on error

Definition at line 264 of file ptafunc2.c.

References L_SORT_BY_X, L_SORT_BY_Y, L_SORT_INCREASING, ptaDestroy(), ptaGetCount(), ptaGetPt(), and ptaSort().

◆ ptaGetSortIndex()

l_ok ptaGetSortIndex ( PTA ptas,
l_int32  sorttype,
l_int32  sortorder,
NUMA **  pnaindex 
)

ptaGetSortIndex()

Parameters
[in]ptas
[in]sorttypeL_SORT_BY_X, L_SORT_BY_Y
[in]sortorderL_SORT_INCREASING, L_SORT_DECREASING
[out]pnaindexindex of sorted order into original array
Returns
0 if OK, 1 on error

Definition at line 139 of file ptafunc2.c.

References L_SORT_BY_X, L_SORT_BY_Y, L_SORT_DECREASING, L_SORT_INCREASING, numaAddNumber(), numaCreate(), numaDestroy(), numaGetSortIndex(), ptaGetCount(), and ptaGetPt().

Referenced by ptaSort().

◆ ptaIntersectionByAset()

PTA* ptaIntersectionByAset ( PTA pta1,
PTA pta2 
)

ptaIntersectionByAset()

Parameters
[in]pta1,pta2
Returns
ptad intersection of the point sets, or NULL on error
Notes:
     (1) See sarrayIntersectionByAset() for the approach.
     (2) The key is a 64-bit hash from the (x,y) pair.
     (3) This is slower than ptaIntersectionByHash(), mostly because
         of the nlogn sort to build up the rbtree.  Do not use for
         large numbers of points (say, > 1M).

Definition at line 408 of file ptafunc2.c.

References l_asetCreateFromPta(), ptaCreate(), and ptaGetCount().

◆ ptaIntersectionByHash()

PTA* ptaIntersectionByHash ( PTA pta1,
PTA pta2 
)

ptaIntersectionByHash()

Parameters
[in]pta1,pta2
Returns
ptad intersection of the point sets, or NULL on error
Notes:
     (1) This is faster than ptaIntersectionByAset(), because the
         bucket lookup is O(n).  It should be used if the pts are
         integers (e.g., representing pixel positions).

Definition at line 607 of file ptafunc2.c.

References findNextLargerPrime(), l_dnaHashAdd(), l_dnaHashCreate(), l_dnaHashCreateFromPta(), l_dnaHashDestroy(), l_hashPtToUint64(), ptaAddPt(), ptaCreate(), ptaFindPtByHash(), ptaGetCount(), and ptaGetIPt().

◆ ptaRemoveDupsByAset()

PTA* ptaRemoveDupsByAset ( PTA ptas)

ptaRemoveDupsByAset()

Parameters
[in]ptasassumed to be integer values
Returns
ptad with duplicates removed, or NULL on error
Notes:
     (1) This is slower than ptaRemoveDupsByHash(), mostly because
         of the nlogn sort to build up the rbtree.  Do not use for
         large numbers of points (say, > 1M).

Definition at line 361 of file ptafunc2.c.

Referenced by generatePtaBoxa(), generatePtaHashBoxa(), generatePtaPolyline(), and ptaUnionByAset().

◆ ptaRemoveDupsByHash()

l_ok ptaRemoveDupsByHash ( PTA ptas,
PTA **  pptad,
L_DNAHASH **  pdahash 
)

ptaRemoveDupsByHash()

Parameters
[in]ptasassumed to be integer values
[out]pptadunique set of pts; duplicates removed
[out]pdahash[optional] dnahash used for lookup
Returns
0 if OK, 1 on error
Notes:
     (1) Generates a pta with unique values.
     (2) The dnahash is built up with ptad to assure uniqueness.
         It can be used to find if a point is in the set:
             ptaFindPtByHash(ptad, dahash, x, y, &index)
     (3) The hash of the (x,y) location is simple and fast.  It scales
         up with the number of buckets to insure a fairly random
         bucket selection for adjacent points.
     (4) A Dna is used rather than a Numa because we need accurate
         representation of 32-bit integers that are indices into ptas.
         Integer –> float –> integer conversion makes errors for
         integers larger than 10M.
     (5) This is faster than ptaRemoveDupsByAset(), because the
         bucket lookup is O(n), although there is a double-loop
         lookup within the dna in each bucket.

Definition at line 550 of file ptafunc2.c.

References findNextLargerPrime(), l_dnaHashAdd(), l_dnaHashCreate(), l_dnaHashDestroy(), l_hashPtToUint64(), ptaAddPt(), ptaCreate(), ptaFindPtByHash(), ptaGetCount(), and ptaGetIPt().

Referenced by ptaUnionByHash().

◆ ptaSort()

PTA* ptaSort ( PTA ptas,
l_int32  sorttype,
l_int32  sortorder,
NUMA **  pnaindex 
)

ptaSort()

Parameters
[in]ptas
[in]sorttypeL_SORT_BY_X, L_SORT_BY_Y
[in]sortorderL_SORT_INCREASING, L_SORT_DECREASING
[out]pnaindex[optional] index of sorted order into original array
Returns
ptad sorted version of ptas, or NULL on error

Definition at line 96 of file ptafunc2.c.

References L_SORT_BY_X, L_SORT_BY_Y, L_SORT_DECREASING, L_SORT_INCREASING, numaDestroy(), ptaGetSortIndex(), and ptaSortByIndex().

Referenced by ptaGetRankValue().

◆ ptaSortByIndex()

PTA* ptaSortByIndex ( PTA ptas,
NUMA naindex 
)

ptaSortByIndex()

Parameters
[in]ptas
[in]naindexna that maps from the new pta to the input pta
Returns
ptad sorted, or NULL on error

Definition at line 189 of file ptafunc2.c.

References numaGetCount(), numaGetIValue(), ptaAddPt(), ptaCreate(), and ptaGetPt().

Referenced by ptaSort().

◆ ptaUnionByAset()

PTA* ptaUnionByAset ( PTA pta1,
PTA pta2 
)

ptaUnionByAset()

Parameters
[in]pta1,pta2
Returns
ptad with the union of the set of points, or NULL on error
Notes:
     (1) See sarrayRemoveDupsByAset() for the approach.
     (2) The key is a 64-bit hash from the (x,y) pair.
     (3) This is slower than ptaUnionByHash(), mostly because of the
         nlogn sort to build up the rbtree.  Do not use for large
         numbers of points (say, > 1M).
     (4) The *Aset() functions use the sorted l_Aset, which is just
         an rbtree in disguise.

Definition at line 324 of file ptafunc2.c.

References ptaCopy(), ptaDestroy(), ptaJoin(), and ptaRemoveDupsByAset().

◆ ptaUnionByHash()

PTA* ptaUnionByHash ( PTA pta1,
PTA pta2 
)

ptaUnionByHash()

Parameters
[in]pta1,pta2
Returns
ptad with the union of the set of points, or NULL on error
Notes:
     (1) This is faster than ptaUnionByAset(), because the
         bucket lookup is O(n).  It should be used if the pts are
         integers (e.g., representing pixel positions).

Definition at line 500 of file ptafunc2.c.

References ptaCopy(), ptaDestroy(), ptaJoin(), and ptaRemoveDupsByHash().