且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

是否可以读取数据集中的每个记录,计算具有许多(但不是全部)相似属性的所有记录,然后全部显示那些相似的属性?

更新时间:2023-02-17 17:42:02

你的意思是你想要的查找具有两个或多个相同属性的行(组)数量?如下所示?



   -   要演示的一些虚拟数据。 
创建 test

a varchar 10 null
b varchar 10 null
c varchar 10 null
d varchar 10 null
e varchar 10 null
f varchar 10 ) null pan>



insert 进入测试(a,b,c,d,e,f)' fred'' jim'' sheila'' wibble'' wobble'' womble'
insert 进入测试(a,b,c,d,e,f)' fred'' ethel',' sheila'' wibble'' 摆动'' womble'
insert into test(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble'' wobble'' womble'
插入 进入测试(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble '' wobble'' womble'
insert into test(a,b,c,d,e,f)' fred',' ethel'' sheila'' wibble'' wobble'' womble'
插入 进入测试(a,b,c,d,e,f)' fred' ' jim'' sheila'' wibble'' wubble'' womble'
插入 进入测试(a,b,c,d,e,f) values ' fred'' albert'' sheila'' wibble'' wubble'' womble'
insert into test(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble'' wobble',' womble'

- - 计算具有相同a,b,c,d,e,f
选择
count(*) as groupCount,
a,b,c,d,e,f
来自 test
group by
a,b,c,d,e,f

groupCount abcdef
- --------- --------- - ---------- ---------- ---------- ---------- --------- -
1 fred albert sheila wibble wubble womble
2 fred ethel sheila wibble wobble womble
4 fred jim sheila wibble wobble womble
1 fred jim sheila wibble wubble womble

4 行受影响)

选择
c ount(*) as groupCount,
b,c,d
来自 test
group by
b,c,d

$ groupCount bcd
- --------- ---------- ---------- ----------
1 albert sheila wibble
2 ethel sheila wibble
5 jim sheila wibble

( 3 行受影响)


嗯,你的问题很模糊,但任何方式都被解释,答案是肯定的。



我相信你想向谷歌询问地图缩减,并在阅读每一条记录时将其应用于你的数据。 / BLOCKQUOTE>

I know in SQL, you need to specify what are the attributes you want to select and find out if those attributes are similar, like you can count all records that have same values in column a, b and c.

What I'm asking is ,it possible to search all records to see what records have similar attributes values and sort of like group them as group 1 (count them like n results found) then show what are those column/fields they have similar values with? Hope you get my point in my crazy question.

My dataset is composed mostly of textual data and time.

Most like this is what is looks like:
http://img14.imageshack.us/img14/6015/84e2.png[^]

Thanks in advanced.

Do you mean that you want to find the number of rows (groups) where you have two or more attributes that are identical? As below?

-- Some dummy data to demonstrate.
create table test
(
 a varchar(10) null,
 b varchar(10) null,
 c varchar(10) null,
 d varchar(10) null,
 e varchar(10) null,
 f varchar(10) null
)


insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','albert','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')

-- Count the number of rows having identical values of a,b,c,d,e,f
select 
    count(*) as groupCount,
    a,b,c,d,e,f
from test
group by 
  a,b,c,d,e,f

groupCount  a          b          c          d          e          f
----------- ---------- ---------- ---------- ---------- ---------- ----------
1           fred       albert     sheila     wibble     wubble     womble
2           fred       ethel      sheila     wibble     wobble     womble
4           fred       jim        sheila     wibble     wobble     womble
1           fred       jim        sheila     wibble     wubble     womble

(4 row(s) affected)

select 
    count(*) as groupCount,
    b,c,d
from test
group by 
   b,c,d

groupCount  b          c          d
----------- ---------- ---------- ----------
1           albert     sheila     wibble
2           ethel      sheila     wibble
5           jim        sheila     wibble

(3 row(s) affected)


Well, your question is vague but any way it is interpreted, the answer is "yes".

I believe you want to ask Google about "Map reduce" and apply it to your data while reading each of the records.