Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

cmset_op.pro @ 31

Last change on this file since 31 was 31, checked in by pinsard, 18 years ago
upgrade of MATRICE according to cerbere.lodyc.jussieu.fr: /usr/home/smasson/SAXO_RD/ : files
Property svn:executable set to ``*
File size: 14.1 KB

Line
1	;+
2	; NAME:
3	; CMSET_OP
4	;
5	; AUTHOR:
6	; Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
7	; craigm@lheamail.gsfc.nasa.gov
8	;
9	; PURPOSE:
10	; Performs an AND, OR, or XOR operation between two sets
11	;
12	; CALLING SEQUENCE:
13	; SET = CMSET_OP(A, OP, B)
14	;
15	; DESCRIPTION:
16	;
17	; SET_OP performs three common operations between two sets. The
18	; three supported functions of OP are:
19	;
20	; OP Meaning
21	; 'AND' - to find the intersection of A and B;
22	; 'OR' - to find the union of A and B;
23	; 'XOR' - to find the those elements who are members of A or B
24	; but not both;
25	;
26	; Sets as defined here are one dimensional arrays composed of
27	; numeric or string types. Comparisons of equality between elements
28	; are done using the IDL EQ operator.
29	;
30	; The complements of either set can be taken as well, by using the
31	; NOT1 and NOT2 keywords. For example, it may be desireable to find
32	; the elements in A but not B, or B but not A (they are different!).
33	; The following IDL expressions achieve each of those effects:
34	;
35	; SET = CMSET_OP(A, 'AND', /NOT2, B) ; A but not B
36	; SET = CMSET_OP(/NOT1, A, 'AND', B) ; B but not A
37	;
38	; Note the distinction between NOT1 and NOT2. NOT1 refers to the
39	; first set (A) and NOT2 refers to the second (B). Their ordered
40	; placement in the calling sequence is entirely optional, but the
41	; above ordering makes the logical meaning explicit.
42	;
43	; NOT1 and NOT2 can only be set for the 'AND' operator, and never
44	; simultaneously. This is because the results of an operation with
45	; 'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
46	; both NOTs -- formally cannot produce a defined result.
47	;
48	; The implementation depends on the type of operands. For integer
49	; types, a fast technique using HISTOGRAM is used. However, this
50	; algorithm becomes inefficient when the dynamic range in the data
51	; is large. For those cases, and for other data types, a technique
52	; based on SORT() is used. Thus the compute time should scale
53	; roughly as (A+B)ALOG(A+B) or better, rather than (AB) for the
54	; brute force approach. For large arrays this is a significant
55	; benefit.
56	;
57	; INPUTS:
58	;
59	; A, B - the two sets to be operated on. A one dimensional array of
60	; either numeric or string type. A and B must be of the same
61	; type. Empty sets are permitted, and are either represented
62	; as an undefined variable, or by setting EMPTY1 or EMPTY2.
63	;
64	; OP - a string, the operation to be performed. Must be one of
65	; 'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
66	; Other operations will cause an error message to be produced.
67	;
68	; KEYWORDS:
69	;
70	; NOT1, NOT2 - if set and OP is 'AND', then the complement of A (for
71	; NOT1) or B (for NOT2) will be used in the operation.
72	; NOT1 and NOT2 cannot be set simultaneously.
73	;
74	; EMPTY1, EMPTY2 - if set, then A (for EMPTY1) or B (for EMPTY2) are
75	; assumed to be the empty set. The actual values
76	; passed as A or B are then ignored.
77	;
78	; INDEX - if set, then return a list of indices instead of the array
79	; values themselves. The "slower" set operations are always
80	; performed in this case.
81	;
82	; The indices refer to the combined array [A,B]. To
83	; clarify, in the following call: I = CMSET_OP(..., /INDEX);
84	; returned values from 0 to NA-1 refer to A[I], and values
85	; from NA to NA+NB-1 refer to B[I-NA].
86	;
87	; COUNT - upon return, the number of elements in the result set.
88	; This is only important when the result set is the empty
89	; set, in which case COUNT is set to zero.
90	;
91	; RETURNS:
92	;
93	; The resulting set as a one-dimensional array. The set may be
94	; represented by either an array of data values (default), or an
95	; array of indices (if INDEX is set). Duplicate elements, if any,
96	; are removed, and element order may not be preserved.
97	;
98	; The empty set is represented as a return value of -1L, and COUNT
99	; is set to zero. Note that the only way to recognize the empty set
100	; is to examine COUNT.
101	;
102	; SEE ALSO:
103	;
104	; SET_UTILS.PRO by RSI
105	;
106	; MODIFICATION HISTORY:
107	; Written, CM, 23 Feb 2000
108	; Added empty set capability, CM, 25 Feb 2000
109	; Documentation clarification, CM 02 Mar 2000
110	; Incompatible but more consistent reworking of EMPTY keywords, CM,
111	; 04 Mar 2000
112	; Minor documentation clarifications, CM, 26 Mar 2000
113	; Corrected bug in empty_arg special case, CM 06 Apr 2000
114	; Add INDEX keyword, CM 31 Jul 2000
115	; Clarify INDEX keyword documentation, CM 06 Sep 2000
116	; Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
117	; Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
118	; LAST_UNIQUE values, CM, 18 Sep 2000
119	; Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
120	; CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
121	; Sep 2000 (thanks to Ben Tupper)
122	; Still trying to get documentation of INDEX and NOT right, CM, 28
123	; Sep 2000 (no code changes)
124	;
125	; $Id: cmset_op.pro,v 1.2 2001/03/25 18:10:42 craigm Exp $
126	;
127	;-
128	; Copyright (C) 2000, Craig Markwardt
129	; This software is provided as is without any warranty whatsoever.
130	; Permission to use, copy, modify, and distribute modified or
131	; unmodified copies is granted, provided this copyright and disclaimer
132	; are included unchanged.
133	;-
134
135	;; Utility function, similar to UNIQ, but allowing choice of taking
136	;; first or last unique element, or non-unique elements.
137	;; Unfortunately this doesn't work because of implementation dependent
138	;; versions of the SORT() function.
139
140	; function cmset_op_uniq, a, first=first, non=non, count=ct, sort=sortit
141	; if n_elements(a) LE 1 then return, 0L
142	; sh = (2Lkeyword_set(first)-1L)(-2L*keyword_set(non)+1)
143	;
144	; if keyword_set(sortit) then begin
145	; ;; Sort it manually
146	; ii = sort(a) & b = a(ii)
147	; if keyword_set(non) then wh = where(b EQ shift(b, sh), ct) $
148	; else wh = where(b NE shift(b, sh), ct)
149	; if ct GT 0 then return, ii(wh)
150	; endif else begin
151	; ;; Use the user's values directly
152	; if keyword_set(non) then wh = where(a EQ shift(a, sh), ct) $
153	; else wh = where(a NE shift(a, sh), ct)
154	; if ct GT 0 then return, wh
155	; endelse
156	;
157	; if keyword_set(first) then return, 0L else return, n_elements(a)-1
158	; end
159
160	;; Simplified version of CMSET_OP_UNIQ which sorts, and takes the
161	;; "first" value, whatever that may mean.
162	function cmset_op_uniq, a
163	if n_elements(a) LE 1 then return, 0L
164
165	ii = sort(a) & b = a(ii)
166	wh = where(b NE shift(b, +1L), ct)
167	if ct GT 0 then return, ii(wh)
168
169	return, 0L
170	end
171
172	function cmset_op, a, op0, b, not1=not1, not2=not2, count=count, $
173	empty1=empty1, empty2=empty2, maxarray=ma, index=index
174
175	on_error, 2 ;; return on error
176	count = 0L
177	index0 = -1L
178	;; Histogram technique is used for array sizes < 32,000 elements
179	if n_elements(ma) EQ 0 then ma = 32L*1024L
180
181	;; Check the number of arguments
182	if n_params() LT 3 then begin
183	ARG_ERR:
184	message, 'USAGE: SET = CMSET_OP(A, OP, B [, COUNT=ct])', /info
185	message, ' KEYWORDS: /NOT1, /NOT2, /EMPTY1, /EMPTY2, INDEX', /info
186	return, -1L
187	endif
188	if n_elements(op0) EQ 0 then goto, ARG_ERR
189	kind = keyword_set(index)
190	fst = 1L
191	if keyword_set(last) then fst = 0L
192	if keyword_set(first) then fst = 1L
193
194	;; Check the operation
195	sz = size(op0)
196	if sz(sz(0)+1) NE 7 then begin
197	OP_ERR:
198	message, "ERROR: OP must be 'AND', 'OR' or 'XOR'"
199	return, -1L
200	endif
201	op = strupcase(op0)
202	if op NE 'AND' AND op NE 'OR' AND op NE 'XOR' then goto, OP_ERR
203
204	;; Check NOT1 and NOT2
205	if keyword_set(not1) AND keyword_set(not2) then begin
206	message, "ERROR: NOT1 and NOT2 cannot be set simultaneously"
207	return, -1L
208	endif
209	if (keyword_set(not1) OR keyword_set(not2)) AND $
210	(op EQ 'OR' OR op EQ 'XOR') then begin
211	message, "ERROR: NOT1 and NOT2 cannot be set with 'OR' or 'XOR'"
212	return, -1L
213	endif
214
215	;; Special cases for empty set
216	n1 = n_elements(a) & n2 = n_elements(b)
217	if keyword_set(empty1) then n1 = 0L
218	if keyword_set(empty2) then n2 = 0L
219	if n1 EQ 0 OR n2 EQ 0 then begin
220	;; Eliminate duplicates
221	if n1 GT 0 then a1 = cmset_op_uniq(a)
222	if n2 GT 0 then b1 = cmset_op_uniq(b)
223	n1 = n_elements(a1) < n1 & n2 = n_elements(b1) < n2
224	case op of
225	'OR': if n1 EQ 0 then goto, RET_A1 else goto, RET_B1
226	'XOR': if n1 EQ 0 then goto, RET_B1 else goto, RET_A1
227	'AND': begin
228	if keyword_set(not1) AND n1 EQ 0 then goto, RET_B1
229	if keyword_set(not2) AND n2 EQ 0 then goto, RET_A1
230	return, -1L
231	end
232	endcase
233	return, -1L
234	RET_A1:
235	count = n1
236	if kind then begin
237	if count GT 0 then return, a1 else return, -1L
238	endif
239	if count GT 0 then return, a(a1) else return, -1L
240	RET_B1:
241	count = n2
242	if kind then begin
243	if count GT 0 then return, b1+n1 else return, -1L
244	endif
245	if count GT 0 then return, b(b1) else return, -1L
246	endif
247
248	;; Check types of operands
249	sz1 = size(a) & tp1 = sz1(sz1(0)+1)
250	sz2 = size(b) & tp2 = sz2(sz2(0)+1)
251	if tp1 NE tp2 then begin
252	TYPE1_ERR:
253	message, 'ERROR: both A and B must be of the same type'
254	return, -1L
255	endif
256	if tp1 EQ 8 OR tp1 EQ 10 AND tp1 EQ 11 then begin
257	TYPE2_ERR:
258	message, 'ERROR: operands must be a numeric or string type'
259	return, -1L
260	endif
261
262	;; Now use two different kinds of algorithms: a slower but more
263	;; general algorithm for generic types, and the histogram technique
264	;; for integer types. Even for integer types, if there is too much
265	;; dynamic range, then the slow method is used.
266
267	if tp1 GE 4 AND tp1 LE 9 then begin
268	;; String and real types, or large int arrays
269	SLOW_SET_OP:
270	case op of
271	'OR': begin
272	uu = [a,b] ;; OR is simple; just take unique values
273	index0 = cmset_op_uniq(uu)
274	count = n_elements(index0)
275	if kind then return, index0
276	return, uu(index0)
277	end
278
279	'XOR': begin
280	;; Make ordered list of set union
281	ai = cmset_op_uniq(a) & na = n_elements(ai)
282	bi = cmset_op_uniq(b) & nb = n_elements(bi)
283	ui = [ai, bi+n1]
284	uu = [a,b] & uu = uu(ui) ;; Raw union...
285	us = sort(uu) & uu = uu(us) ;; ...and sort
286	if kind then ui = ui(temporary(us)) else ui = 0
287
288	;; Values in one set only will not have duplicates
289	wh1 = where(uu NE shift(uu, -1), count1)
290	if count1 EQ 0 then return, -1L
291	wh = where(wh1(1:*)-wh1 EQ 1, count)
292	if wh1(0) EQ 0 then begin
293	if count GT 0 then wh = [-1L, wh] else wh = [-1L]
294	count = n_elements(wh)
295	endif
296	if count EQ 0 then return, -1
297	if kind then return, ui(wh1(wh+1))
298	return, uu(wh1(wh+1))
299	end
300
301	'AND': begin
302	;; Make ordered list of set union
303	ai = cmset_op_uniq(a) & na = n_elements(ai)
304	bi = cmset_op_uniq(b) & nb = n_elements(bi)
305	ui = [ai, bi+n1]
306	uu = [a,b] & uu = uu(ui) ;; Raw union...
307	us = sort(uu) & uu = uu(us) ;; ...and sort
308	if kind then ui = ui(us) else ui = 0
309
310	if NOT keyword_set(not1) AND NOT keyword_set(not2) then begin
311	;; If neither "NOT" is set, then find duplicates
312	us = 0L ;; Save memory
313	wh = where(uu EQ shift(uu, -1L), count) ;; Find non-unique
314	if count EQ 0 then return, -1L
315	;; This should always select the element from A
316	;; rather than B (the smaller of the two)
317	if kind then return, (ui(wh) < ui(wh+1))
318	return, uu(wh)
319	endif
320
321	;; For "NOT" cases, we need to identify by set
322	ii = make_array(na+nb, value=1b)
323	if keyword_set(not1) then ii(0:na-1) = 0
324	if keyword_set(not2) then ii(na:*) = 0
325	ii = ii(temporary(us))
326
327	;; Remove any duplicates
328	wh1 = where(uu EQ shift(uu, -1L), count1) ;; Find non-unique
329	if count1 GT 0 then ii([wh1, wh1+1]) = 0
330	;; Remainder is the desired set
331	wh = where(ii, count)
332	if count EQ 0 then return, -1L
333	if kind then return, ui(wh)
334	return, uu(wh)
335	end
336
337	endcase
338	return, -1L ;; DEFAULT CASE
339	endif else begin
340
341	;; INDEX keyword forces the "slow" operation
342	if kind then goto, SLOW_SET_OP
343
344	;; Integer types - use histogram technique if the data range
345	;; is small enough, otherwise use the "slow" technique above
346	min1 = min(a, max=max1) & min2 = min(b, max=max2)
347	minn = min1 < min2 & maxx = max1 > max2
348	nbins = maxx-minn+1
349	if (maxx-minn) GT floor(ma(0)) then goto, SLOW_SET_OP
350
351	;; Following operations create a histogram of the integer values.
352	ha = histogram(a, min=minn, max=maxx) < 1
353	hb = histogram(b, min=minn, max=maxx) < 1
354
355	;; Compute NOT cases
356	if keyword_set(not1) then ha = 1b - ha
357	if keyword_set(not2) then hb = 1b - hb
358	case op of
359	;; Boolean operations
360	'AND': mask = temporary(ha) AND temporary(hb)
361	'OR': mask = temporary(ha) OR temporary(hb)
362	'XOR': mask = temporary(ha) XOR temporary(hb)
363	endcase
364
365	wh = where(temporary(mask), count)
366	if count EQ 0 then return, -1L
367
368	return, wh+minn
369
370	endelse
371
372	return, -1L ;; DEFAULT CASE
373	end
374
375	; Here is how I did the INDEX stuff with fast histogramming. It
376	; works, but is complicated, so I forced it to go to SLOW_SET_OP.
377	; ha = histogram(a, min=minn, max=maxx, reverse=ra) < 1
378	; rr = ra(0:nbins) & mask = rr NE rr(1:) & ra = ra(rr)mask-1L+mask
379	; hb = histogram(b, min=minn, max=maxx, reverse=rb) < 1
380	; rr = rb(0:nbins) & mask = rr NE rr(1:) & rb = rb(rr)mask-1L+mask
381	; ... AND/OR/XOR NOT masking here ...
382	; ra = ra(wh) & rb = rb(wh)
383	; return, ra(ra GE 0) + (rb+n1)(ra LT 0) ;; is last 'ra' right?
384

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format